Skip Navigation

Stubsack: weekly thread for sneers not worth an entire post, week ending 24th November 2024

Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.

Any awful.systems sub may be subsneered in this subthread, techtakes or no.

If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.

The post Xitter web has spawned soo many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)

Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.

Last week's thread

(Semi-obligatory thanks to @dgerard for starting this)

182 comments
  • Oh hey looks like another Chat-GPT assisted legal filing, this time in an expert declaration about the dangers of generative AI: https://www.sfgate.com/tech/article/stanford-professor-lying-and-technology-19937258.php

    The two missing papers are titled, according to Hancock, “Deepfakes and the Illusion of Authenticity: Cognitive Processes Behind Misinformation Acceptance” and “The Influence of Deepfake Videos on Political Attitudes and Behavior.” The expert declaration’s bibliography includes links to these papers, but they currently lead to an error screen.

    Irony can be pretty ironic sometimes.

  • a better-thought-out announcement is coming later today, but our WriteFreely instance at gibberish.awful.systems has reached a roughly production-ready state (and you can hack on its frontend by modifying the templates, pages, static, and less directories in this repo and opening a PR)! awful.systems regulars can ask for an account and I'll DM an invite link!

  • Dude discovers that one LLM model is not entirely shit at chess, spends time and tokens proving that other models are actually also not shit at chess.

    The irony? He's comparing it against Stockfish, a computer chess engine. Computers playing chess at a superhuman level is a solved problem. LLMs have now slightly approached that level.

    For one, gpt-3.5-turbo-instruct rarely suggests illegal moves,

    Writeup https://dynomight.net/more-chess/

    HN discussion https://news.ycombinator.com/item?id=42206817

    • LLMs sometimes struggle to give legal moves. In these experiments, I try 10 times and if there’s still no legal move, I just pick one at random.

      uhh

    • Particularly hilarious at how thoroughly they're missing the point. The fact that it suggests illegal moves at all means that no matter how good it's openings are the scaling laws and emergent behaviors haven't magicked up an internal model of the game of Chess or even the state of the chess board it's working with. I feel like playing games is a particularly powerful example of this because the game rules provide a very clear structure to model and it's very obvious when that model doesn't exist.

    • I remember when several months (a year ago?) when the news got out that gpt-3.5-turbo-papillion-grumpalumpgus could play chess around ~1600 elo. I was skeptical the apparent skill wasn't just a hacked-on patch to stop folks from clowning on their models on xitter. Like if an LLM had just read the instructions of chess and started playing like a competent player, that would be genuinely impressive. But if what happened is they generated 10^12 synthetic games of chess played by stonk fish and used that to train the model- that ain't an emergent ability, that's just brute forcing chess. The fact that larger, open-source models that perform better on other benchmarks, still flail at chess is just a glaring red flag that something funky was going on w/ gpt-3.5-turbo-instruct to drive home the "eMeRgEnCe" narrative. I'd bet decent odds if you played with modified rules, (knights move a one space longer L shape, you cannot move a pawn 2 moves after it last moved, etc), gpt-3.5 would fuckin suck.

      Edit: the author asks "why skill go down tho" on later models. Like isn't it obvious? At that moment of time, chess skills weren't a priority so the trillions of synthetic games weren't included in the training? Like this isn't that big of a mystery...? It's not like other NN haven't been trained to play chess...

    • @gerikson @BlueMonday1984 the only analysis of computer chess anybody needs https://youtu.be/DpXy041BIlA?si=a1vU3zmOWs8UqlSQ

    • Here are the results of these three models against Stockfish—a standard chess AI—on level 1, with a maximum of 0.01 seconds to make each move

      I'm not a Chess person or familiar with Stockfish so take this with a grain of salt, but I found a few interesting things perusing the code / docs which I think makes useful context.

      Skill Level

      I assume "level" refers to Stockfish's Skill Level option.

      If I mathed right, Stockfish roughly estimates Skill Level 1 to be around 1445 ELO (source). However it says "This Elo rating has been calibrated at a time control of 60s+0.6s" so it may be significantly lower here.

      Skill Level affects the search depth (appears to use depth of 1 at Skill Level 1). It also enables MultiPV 4 to compute the four best principle variations and randomly pick from them (more randomly at lower skill levels).

      Move Time & Hardware

      This is all independent of move time. This author used a move time of 10 milliseconds (for stockfish, no mention on how much time the LLMs got). ... or at least they did if they accounted for the "Move Overhead" option defaulting to 10 milliseconds. If they left that at it's default then 10ms - 10ms = 0ms so 🤷‍♀️.

      There is also no information about the hardware or number of threads they ran this one, which I feel is important information.

      Evaluation Function

      After the game was over, I calculated the score after each turn in “centipawns” where a pawn is worth 100 points, and ±1500 indicates a win or loss.

      Stockfish's FAQ mentions that they have gone beyond centipawns for evaluating positions, because it's strong enough that material advantage is much less relevant than it used to be. I assume it doesn't really matter at level 1 with ~0 seconds to produce moves though.

      Still since the author has Stockfish handy anyway, it'd be interesting to use it in it's not handicapped form to evaluate who won.

  • Interesting post and corresponding mastodon thread on the non-decentralised-ness of bluesky by cwebber.

    https://dustycloud.org/blog/how-decentralized-is-bluesky/

    https://social.coop/@cwebber/113527462572885698

    The author is keen about this particular “vision statement”:

    Preparing for the organization as a future adversary.

    The assumption being, stuff gets enshittified and how might you guard your product against the future stupid and awful whims of management and investors?

    Of course, they don’t consider that it cuts both ways, and Jack Dorsey’s personal grumbles about Twitter. The risk from his point of view was the company he founded doing evil unthinkable things like, uh, banning nazis. He’s keen for that sort of thing to never happen again on his platforms.

    • note that cwebber wrote the ActivityPub spec, btw

    • @rook @techtakes Dorsey is off the board now, as of this month I think.

      I read that posts on BlueSky are permanently stored in a blockchain, which, if true, would put me off.

      • I’m aware he isn’t there now, but it bears remembering that he was there at the beginning when these goals were being shaped, and as we have seen with twitter there’s nothing to stop him coming back, even if nostr is his new best friend for now.

        I read that posts on BlueSky are permanently stored in a blockchain,

        So, this is complex and hard to find concrete information on, but:

        1. Bluesky use a merkle tree based things. Don’t call em blockchain… that’s the sort of thing cryptocurrency boosters want so as to present their technologies are useful.
        2. Posts are stored in a merkle search tree, but attachments are stored separately. Attached blobs (like images) can be (and are) deleted independently of the tree nodes which reference them.
        3. The merkle trees are independent and can be modified without having to rewrite the whole history of every post on bluesky, because there isn’t one central official ledger of all posts.

        From bluesky’s own (non technical) blurb on the subject,

        it takes a bit longer for the text content of a post to be fully deleted in storage. The text content is stored in a non-readable form, but it is possible to query the data via the API. We will periodically perform back-end deletes to entirely wipe this data.

        The merkle trees are per-user, which makes history-modifying operations like rebasing practical… this facility apparently landed last summer, eg. Intention to remove repository history. Flagging tree nodes as deleted, and then actually destroying them in a series of later operations (rebase, then garbage collection) would explain the front end respecting deletions but lower-level protocols showing older state for a little while.

182 comments