Interlude: A Mechanistic Interpretability Analysis of Grokking

Neel Nanda·Neel Nanda·AI Safety·August 18, 2022

I left my job at Anthropic a few months ago and since then I’ve been taking some time off and poking around at some independent research. And I’ve just published my first set of interesting results! I used mechanistic interpretability tools to investigate what’s up with the ML phenomena of grokking - where models trained on simple mathematical operations like addition mod 113 and given 30% of the data will initially memorise the data, but then if trained for a long time will abruptly generalise ...

Read full article →

Interlude: A Mechanistic Interpretability Analysis of Grokking

Related Articles