Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)DI
Posts
4
Comments
55
Joined
2 yr. ago

TechTakes @awful.systems

Musk ("xAI") now claims grok was hacked

TechTakes @awful.systems

Gemini seem to have "solved" my duck river crossing, lol.

TechTakes @awful.systems

Gemini 2.5 "reasoning", no real improvement on river crossings.

SneerClub @awful.systems

Some tests of how much AI "understands" what it says (spoiler: very little)

  • Further support for the memorization claim: I posted examples of novel river crossing puzzles where LLMs completely fail (on this forum).

    Note that Apple’s actors / agents river crossing is a well known “jealous husbands” variant, which you can ask a chatbot to explain to you. It gladly explains, even as it can’t follow its own explanation (since of course it isn’t its own explanation but a plagiarized one, even if changes words).

    edit: https://awful.systems/post/4027490 and earlier https://awful.systems/post/1769506

    I think what I need to do is to write up a bunch of puzzles, assign them randomly to 2 sets, and test & post one set, while holding back on the second set (not even testing it on any online chatbots). Then in a year or two see how much the set that's public improves, vs the one that's held back.

  • Can’t be assed to read the bs but sometimes the use after free only happens in some rarely executed code path, or only when one branch is executed then later another branch. So you still may need fuzzing to trigger use after free for Valgrind to detect.

  • I swear I’m gonna plug an LLM into a rather traditional solver I’m writing. I may tuck deep into the paper a point how it’s quite slow to use an LLM to mutate solutions in a genetic algorithm or a swarm solver. And in any case non LLM would be default.

    Normally I wouldn’t sink that low but I got mouths to feed, and frankly, fuck it, they can persist in this madness for much longer than I can stay solvent.

    This is as if there was a mass delusion that a pseudorandom number generator can serve as an oracle, predicting the future. Doing any kind of Monte Carlo simulation of something like weather in that world would of course confirm all the dumb shit.

  • Yeah plenty of opportunities to just work it into the story.

    I dunno what kind of local models you can use, though. If it is a 3D game then its fine to require a GPU, but you wouldn't want to raise minimum requirements too high. And you wouldn't want to use 12 gigs of vram for a gimmick, either.

  • I think it could work as a minor gimmick, like terminal hacking minigame in fallout. You have to convince the LLM to tell you the password, or you get to talk to a demented robot whose brain was fried by radiation exposure, or the like. Relatively inconsequential stuff like being able to talk your way through or just shoot your way through.

    Unfortunately this shit is too slow and too huge to embed a local copy of, into a game. You need a lot of hardware compatibility. And running it in the cloud would cost too much.

  • I was trying out free github copilot to see what the buzz is all about:

    It doesn't even know its own settings. This one little useful thing that isn't plagiarism, providing natural language interface to its own bloody settings, it couldn't do.

  • All joking aside, there is something thoroughly fucked up about this.

    What's fucked up is that we let these rich fucks threaten us with extinction to boost their stock prices.

    Imagine if some cold fusion scammer was permitted to gleefully boast that his experimental cold fusion plant in the middle of a major city could blow it up. Setting up little hydrogen explosions, setting up a neutron source just to make it spicier, etc.

  • It is as if there were people fantasizing about automaton mouths and lips and tongues and vocal cords for some reason, and come up with all these fantasies of how it'll be when automatons can talk.

    And then Edison invents the phonograph.

    And then they stick their you know what in the gearing between the cylinder and the screw.

    Except somehow more stupid, because these guys are worried about AI apocalypse while boosting AI hype that pays for this supposed apocalypse.

    edit: If someone said in 1850s "automatons won't be able to talk for another 150 years or longer because the vocal tract is too intricate", and some automaton fetishist says that they will be able to talk in 20 years, the phonograph shouldn't lend any credence whatsoever to the latter. What is different this time is that phonograph was genuinely extremely useful for what it is, while the generative AI is not quite as useful and they're going for the automaton fetishist money.

  • When confronted with a problem like “your search engine imagined a case and cited it”, the next step is to wonder what else it might be making up, not to just quickly slap a bit of tape over the obvious immediate problem and declare everything to be great.

    Exactly. Even if you ensure the cited cases or articles are real it will misrepresent what said articles say.

    Fundamentally it is just blah blah blah ing until the point comes when a citation would be likely to appear, then it blah blah blahs the citation based on the preceding text that it just made up. It plain should not be producing real citations. That it can produce real citations is deeply at odds with it being able to pretend at reasoning, for example.

    Ensuring the citation is real, RAG-ing the articles in there, having AI rewrite drafts, none of these hacks do anything to address any of the underlying problems.

  • Actually, having read it carefully, it is interesting that they actually don't claim it was hacked, they claim that the modification was unauthorized. They also don't claim that they removed the access from that mysterious "employee" who modified it. I'm thinking they had some legal reason to technically not lie.

  • It re consumes its own bullshit, and the bullshit it does print is the bullshit it also fed itself, its not lying about that. Of course, it is also always re consuming the initial prompt too so the end bullshit isn’t necessarily quite as far removed from the question as the length would indicate.

    Where it gets deceptive is when it knows an answer to the problem, but it constructs some bullshit for the purpose of making you believe that it solved the problem on its own. The only way to tell the difference is to ask it something simpler that it doesn’t know the answer to, and watch it bullshit in circles or to an incorrect answer.

  • He’s such a complete moron. He doesn’t want to recite “DEI shibboleths”? What does he even think that would refer to? Why shibboleths?

    To spell it out, that would refer to an antisemitic theory that the reason (for example) some black guy would get a medal of honor (the “deimedal”) is because of the jews.

    I swear this guy is dumber than Trump. Trump for all his rambling, uses actual language - Trump understands what the shit he is saying means to his followers. Scott… he really does not.

  • I think they worked specifically on cheating the benchmarks, though. As well as popular puzzles like pre existing variants of the river crossing - it is a very large puzzle category, very popular, if the river crossing puzzle is not on the list I don't know what would be.

    Keep in mind that they are also true believers, too - they think that if they cram enough little pieces of logical reasoning, taken from puzzles, into the AI, then they will get robot god that will actually start coming up with new shit.

    I very much doubt that there's some general reasoning performance improvement that results in these older puzzle variants getting solved, while new ones that aren't particularly more difficult, fail.