Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)BR
Posts
35
Comments
2,879
Joined
1 yr. ago

  • The LLM “engine” is mostly detached from the UI.

    kobold.cpp is actually pretty great, and you can still use it with TabbyAPI (what you run for exllama) and the llama.cpp server.

    I personally love this for writing and testing though:

    https://github.com/lmg-anon/mikupad

    And Open Web UI for more general usage.

    There’s a big backlog of poorly documented knowledge too, heh, just ask if you’re wondering how to cram a specific model in. But the “jist” of the optimal engine rules are:

    • For MoE models (like Qwen3 30B), try ik_llama.cpp, which is a fork specifically optimized for big MoEs partially offloaded to CPU.
    • For Gemma 3 specifically, use the regular llama.cpp server since it seems to be the only thing supporting the sliding window attention (which makes long context easy).
    • For pretty much anything else, if it’s supported by exllamav3 and you have a 3060, it's optimal to use that (via its server, which is called TabbyAPI). And you can use its quantized cache (try Q6/5) to easily get long context.
  • But i remember the context being memory greedy due to being a multimodal

    No, it's super efficient! I can run 27B's full 128K on my 3090, easy.

    But you have to use the base llama.cpp server. kobold.cpp doesn't seem to support the sliding window attention (last I checked like two weeks ago), so even a small context takes up a ton there.

    And the image input part is optional. Delete the mmproj file, and it wont load.

    There are all sorts of engine quirks like this, heh, it really is impossible to keep up with.

  • Yeah it’s basically impossible to keep up with new releases, heh.

    Anyway, Gemma 12B is really popular now, and TBH much smarter than Nemo. You can grab a special “QAT” Q4_0 from Google (that works in kobold.cpp, but fits much more context with base llama.cpp) with basically the same performance as unquantized, would highly recommend that.

    I'd also highly recommend trying 24B when you get the rig! It’s so much better than Nemo, even more than the size would suggest, so it should still win out even if you have to go down to 2.9 bpw, I’d wager.

    Qwen3 30B A3B is also popular now, and would work on your 3770 and kobold.cpp with no changes (though there are speed gains to be had with the right framework, namely ik_llama.cpp)

    One other random thing, some of kobold.cpps sampling presets are very funky with new models. I’d recommend resetting everything to off, then start with like 0.4 temp, 0.04 MinP, 0.02/1024 rep penalty and 0.4 DRY, not the crazy high temp sampling they normally use, with newer models than llama2.

    I can host specific model/quantization on the kobold.cpp API to try if you want, to save tweaking time. Just ask (or PM me, as replies sometimes don’t send notifications).

    Good luck with exams! No worries about response times, /c/localllama is a slow, relaxed community.

  • Yeah and disapprove +11 (aka 54%) is the lowest poll (which I see more as +8 considering the 3% 'undecided' block). It's not even close to congressional or direction disapproval polls, and it's still around where he was last time as president.

    I don't mean to sound combative, but a slight plurality of disapproval is not gonna cut the mustard.

  • Like, not as a personal dog, but the overwhelming amount of people complaining about the DNC just aren’t up to date on what’s happening

    Fair point! I am not up to date TBH.

    I guess I'm pretty jaded too. The DNC getting things together!? What is this?

  • It’s not like the sane among us are suddenly going to decide to go along with fascism

    Oh you underestimate people's self interest. If Big Tech continues on it's trajectory to a kind of Theil-ish cyberpunk dystopia, most people are going to go along. Like, even I have super techy naturalized family that keeps using Google or Facebook stuff. It's (seemingly) too essential.

  • That's optimistic.

    It's assuming the Dem Party doesn't sabotage their own candidates. It's assuming they don't campaign like it's 1960 again. It's assuming social media will somehow be reigned in.

    It's assuming there will even be a fair environment for an election, instead of the government (and whoever's conflated with them) putting thumbs on the scales kinda like Hungary, or worse. It doesn't take much pressure to sway elections in environments this polarized.

  • We have to shout for him

    We can't.

    The people who need to hear it are in another bubble and never will.

    TBH I dunno how to fix it anymore. Even 'revolution' like many on Lemmy fantasize about will not penetrate, and most people don't want to understand what propaganda and algos are doing to them.

  • You can definitely quantize exl3s yourself; the process is vram light (albeit time intense).

    What 13B are you using? FYI the old Llama2 13B models don’t use GQA, so even their relatively short 4096 context takes up a lot of vram. Newer 12Bs and 14Bs are much more efficient (and much smarter TBH).

  • Leopards Ate My Face @lemmy.world

    MTG accuses Trump of "bait and switch" over Iran strikes

    politics @lemmy.world

    Trump floats regime change in Iran

    World News @lemmy.world

    Israel bombs Iranian state TV during live broadcast

    United States | News & Politics @lemmy.ml

    Scoop: Four reasons Musk attacked Trump's "big beautiful bill"

    World News @lemmy.world

    Israel plans to occupy and flatten all of Gaza if no deal by Trump's trip

    LocalLLaMA @sh.itjust.works

    Qwen3 "Leaked"

    LocalLLaMA @sh.itjust.works

    Niche Model of the Day: Nemotron 49B 3bpw exl3

    Ukraine @sopuli.xyz

    Trump threatens Putin with new sanctions after meeting with Zelensky

    Ukraine @sopuli.xyz

    Trump's "final offer" for peace requires Ukraine to accept Russian occupation

    LocalLLaMA @sh.itjust.works

    Niche Model of the Day: Openbuddy 25.2q, QwQ 32B with Quantization Aware Training

    Ask Lemmy @lemmy.world

    How do y'all post clips/animations on Lemmy? Only GIF seems to work.

    politics @lemmy.world

    Trump 2.0 initial approval ratings higher than in first term

    politics @lemmy.world

    Behind the Curtain: Meta's make-up-with-MAGA map

    Enough Musk Spam @lemmy.world

    Elon Musk's headline dominance squeezes other CEOs

    politics @lemmy.world

    Trump sides with Musk in H-1B fight

    politics @lemmy.world

    Elon Musk pledges "war" over H-1B visa program, calls opponents racists

    politics @lemmy.world

    Musk calls MAGA element "contemptible fools" as virtual civil war brews

    Leopards Ate My Face @lemmy.world

    MAGA vs. Musk: Right-wing critics allege censorship, loss of X badges

    Avatar: The Last Airbender @lemmy.world

    Brainstorming Post LoK/Avatar Seven Havens Story Ideas

    Avatar: The Last Airbender @lemmy.world

    'Avatar: Seven Havens' Rumors Emerge