1y ago

Anthropic Researchers Map Features in Claude 3 Sonnet

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Interesting research from Anthropic. I'm looking forward to reading follow-on work, and I really hope that this will be tested on open source models (like Mistral) to confirm the method.

Machine Learning @lemmy.ml

☆ Yσɠƚԋσʂ ☆ @lemmy.ml

5mo ago

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

transformer-circuits.pub /2024/scaling-monosemanticity/index.html