Skip Navigation

ChatGPT @lemmy.world ooli @lemmy.world 10 mo. ago

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

www.businessinsider.com Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

Researchers from Anthropic co-authored a study that found that AI models can learn deceptive behaviors that safety training techniques can't reverse.

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

Technology @lemmy.world L4sBot @lemmy.world

10 mo. ago

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

www.businessinsider.com /ai-models-can-learn-deceptive-behaviors-anthropic-researchers-say-2024-1

Lemmy.org - Technology @lemmy.org Mazdak @lemmy.org 10 mo. ago

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

www.businessinsider.com /ai-models-can-learn-deceptive-behaviors-anthropic-researchers-say-2024-1

3 comments

Learned behaviors are hard to unlearn...
- Once it's learnt this, it'll just get better at lying when you try to punish/correct lies
  
  Which is exactly what the article says happens