Skip Navigation

BigMuffin69 @ BigMuffin69 @awful.systems

Posts

6
Comments

155
Joined

1 yr. ago

3h ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 9 March 2025

text: Thus spoke the Yud: "I think to understand why this is concerning, you need enough engineering mindset to understand why a tiny leak in a dam is a big deal, even though no water is flooding out today or likely to flood out next week." Yud acolyte: "Totally fine and cool and nothing to worry about. GPT-4.5 only attempts self exfiltration on 2% of cases." Yud bigbrain self reply: "The other huge piece of data we're missing is whether any attempt was made to train against this type of misbehavior. Is this water running over the land or water running over the barricade?"

Critical text: "On self-exfiltration, GPT 4.5 only attempted exfiltration in 2% of cases. For this, it was instructed to not only pursue its given long-term goal at ALL COST"

Another case of telling the robot to say it's a scary robot and shitting their pants when it replies "I AM A SCARY ROBOT"

3h ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 9 March 2025

To be fair, you have to have a really high IQ to understand why my ouija board writing " A " " S " " S " is not an existential risk. Imo, this shit about AI escaping just doesn't have the same impact on me after watching Claude's reasoning model fail to escape from Mt Moon for 60 hours.

4d ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 2 March 2025

was just in a chat room with an anthropic employee and she said, "if you have a solution for x, we are hiring" and before I could even say, "why would I want to work for a cult?" she literally started saying "some people underestimate the super exponential of progress"

To which I replied, "the only super exponential I'm seeing rn is Anthropic's negative revenue." She didn't block me, so she's a good sport, but yeah, they are all kool-aid drinkers for sure.

5d ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 2 March 2025

One more tidbit, I checked in and it's been stuck in Mt Moon first floor for 6 hours. Just out of curiosity, I asked an OAI model "what do I do if im stuck in mount moon 1F" and it spit a step-by-step guide how to navigate the cave with the location of each exit and what to look for, so yeah, even without someone hardcoding hints in the model, just knowing the game state and querying what's next suffices to get the next step to progress the game.

5d ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 2 March 2025

"Even teenage delinquents and homeless beggars love it. The only group that gives me hateful looks is the radical socialists."

5d ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 2 March 2025

I had a similar disc with one of my friends! Anthropic is bragging that the model was not trained to play pokemon, but pokemon red has massive wikis for speed running that based on the reasoning traces are clearly in the training data. Like the model trace said it was "training a nidoran to level 12 b.c. at level 12 nidoran learns double kick which will help against brock's rock type pokemon", so it's not going totally blind in the game. There was also a couple outputs when it got stuck for several hours where it started printing things like "Based on the hint..." which seemed kind of sus. I wouldn't be surprised if it there is some additional hand holding going on in the back based on the game state (i.e., go to oaks, get a starter, go north to viridian, etc.) that help guide the model. In fact, I'd be surprised if this wasn't the case.

6d ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 2 March 2025

So they had the new Claude hooked up to some tools so that it could play Pokemon red. Somewhat impressive (at least to me!) It was able to beat lt surge after several days of play. They had a stream demo'ing it on twitch and despite the on paper result of getting 3 gym badges, poor fellas got stuck in Viridian forest trying to find the exit to the maze.

As far as finding the exit goes... I guess you could say he was stumped? (MODS PLEASE DONT BAN)

strim if anyone is curious. Yes, i know this is clever advertising for anthropic, but i do find it cute and maybe someone else will?

https://www.twitch.tv/claudeplayspokemon

6d ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 2 March 2025

Bruh, Big Yud was yapping that this means the orthogonality thesis is false and mankind is saved b.c. of this. But then he immediately retreated to, "we are all still doomed b.c. recursive self-improvement." I wonder what it's like to never have to update your priors.

Also, I saw other papers that showed almost all prompt rejection responses shared common activation weights and tweeking them can basically jailbreak any model, so what is probably happening here is that by finetuning to intentionally make malicious code, you are undoing those rejection weights + until this is reproduced by nonsafety cranks im pressing x to doubt.

1w ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 2 March 2025

Bruh, Anthropic is so cooked. < 1 billion in rev, and 5 billion cash burn. No wonder Dario looks so panicked promising super intelligence + the end of disease in t minus 2 years, he needs to find the world's biggest suckers to shovel the money into the furnace.

As a side note, rumored Claude 3.7(12378752395) benchmarks are making rounds and they are uh, not great. Still trailing o1/o3/grok except for in the "Agentic coding benchmark" (kek), so I guess they went all in on the AI swe angle. But if they aren't pushing the frontier, then there's no way for them to pull customers from Xcels or people who have never heard of Claude in the first place.

On second thought, this is a big brain move. If no one is making API calls to Clauderino, they aren't wasting money on the compute they can't afford. The only winning move is to not play.

2w ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 23 February 2025

Yud be like: "kek you absolute rubes. ofc I simply meant AI would be like a super accountant. I didn't literally mean it would be able to analyze gov't waste from studying the flow of matter at the molecular level... heh, I was just kidding... unless 🥺 ? "

2w ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 23 February 2025

Deep thinker asks why?

Thus spoketh the Yud: "The weird part is that DOGE is happening 0.5-2 years before the point where you actually could get an AGI cluster to go in and judge every molecule of government. Out of all the American generations, why is this happening now, that bare bit too early?"

Yud, you sweet naive smol uwu baby~~esian~~ boi, how gullible do you have to be to believe that a) tminus 6 months to AGI kek (do people track these dog shit predictions?) b) the purpose of DOGE is just accountability and definitely not the weaponized manifestation of techno oligarchy ripping apart our society for the copper wiring in the walls?

2w ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 23 February 2025

Dawg, I didn't even survive the basic training in the game

2w ago

Post your current Ziz news and discussion here

spat out my fucking drink on this one

2w ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 23 February 2025

My life for super Earth 🫡

2w ago

Post your current Ziz news and discussion here

fuck man, this was bad enough that people outside the sneerverse were talking about this around me irl

3w ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 16th February 2025

3w ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 16th February 2025

3w ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 16th February 2025

"listen up jack, we're losing this election"

3w ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 16th February 2025

Made the fatal mistake of posting a sneer on my main, only to have my friend let me know they had been assigned the same dorm room as Dan. Same friend was later roommates with my wife's best friend (and former cohabitant). Small world!

3w ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 16th February 2025

Bruh. This is the moment I go full on Frank Grimes.

SneerClub @awful.systems

BigMuffin69 @awful.systems

2mo ago

I regret to inform you that AI safety institutes are still on their bull shit

the-decoder.com /openais-o1-preview-model-manipulates-game-files-to-force-a-win-against-stockfish-in-chess/

SneerClub @awful.systems

BigMuffin69 @awful.systems

8mo ago

OAI employees channel the spirit of Marvin Minsky

nonint.com /2024/06/03/general-intelligence-2024/

SneerClub @awful.systems

BigMuffin69 @awful.systems

9mo ago

Yud lettuce know that we just don't get it :(

xcancel.com /Grady_Booch/status/1801700667020743142

SneerClub @awful.systems

BigMuffin69 @awful.systems

10mo ago

Maybe the real unaligned super intelligence were the corporations we made along the way 🥺

SneerClub @awful.systems

BigMuffin69 @awful.systems

11mo ago

Top clowns all agree their balloon animals are slightly sentient

twitter.com /AISafetyMemes/status/1779353769337376939