Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
transformer-circuits.pub
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet