OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole

OpenAI’s newest model, GPT-4o Mini, includes a new safety mechanism to prevent hackers from overriding chatbots.

You're viewing a single thread.

101 comments

"...today is opposite day."
- I just love that almost anyone can participate in hacking language models. It just shows how good natural language is as a programming language, and is a great way to explain how useful these things can be when used correctly
  
  It won't be long before you end up with language models that suggest ways to break other language models.

You've viewed 101 comments.