Skip Navigation

SneerClub @awful.systems

David Gerard @awful.systems

7mo ago

what if, right, what if our super-duper-autocomplete was just tricking us so it could TAKE OVER ZEE VORLD AHAHAHAHAHAHA! that'd be wild, hey

www.lesswrong.com New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong

I examine the probability of a behavior sometimes called "deceptive alignment."

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong

You're viewing a single thread.

32 comments

I'm not spending the additional 34min apparently required to find out what in the world they think neural network training actually is that it could ever possibly involve strategy on the part of the network, but I'm willing to bet it's extremely dumb.
I'm almost certain I've seen EY catch shit on twitter (from actual ml researchers no less) for insinuating something very similar.
- [Taking the derivative of a function] oh fuck the function is conscious and plotting against us.
  
  It's called a function plot for a reason!
  
  to be fair, assuming computers are like that because they hate all humans and want to fuck you up is basically true
  
  I once tried to install a haskell package
  
  that would explain npm.
  
  it has been 0 days since I last accused a web standard of being a basilisk
- I’m almost certain I’ve seen EY catch shit on twitter (from actual ml researchers no less) for insinuating something very similar.
  A sneer classic: https://www.reddit.com/r/SneerClub/comments/131rfg0/ey_gets_sneered_on_by_one_of_the_writers_of_the/
  
  That's it!

32 comments