Skip Navigation

Anthropic Researchers Map Features in Claude 3 Sonnet

transformer-circuits.pub

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Interesting research from Anthropic. I'm looking forward to reading follow-on work, and I really hope that this will be tested on open source models (like Mistral) to confirm the method.

Machine Learning @lemmy.ml

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

0 comments