Some observations about NLA explanations

·LessWrong··

I used the Gemma 3 12B activation verbalizer (maps activations to English) and reconstructor (maps English to activations) described in the Natural Language Autoencoders (NLA) paper to generate a bunch of explanations for 20k random tokens from a pretraining dataset (Common Pile derivative) and another 20k random tokens from a chat dataset. I also reconstructed all of the activations from the verbalizations so that I could see what kinds of tokens and explanations have high reconstruction error....

Read full article →

Related Articles

An OpenAI model has disproved a central conjecture in discrete geometry
tedsanders · Hacker News · 20h ago
GitHub confirms breach of 3,800 repos via malicious VSCode extension
Timofeibu · Hacker News · 1d ago
Show HN: Rmux – A programmable terminal multiplexer with a Playwright-style SDK
shideneyu · Hacker News · 6h ago
Incident Report: May 19, 2026 – GCP Account Suspension
0xedb · Hacker News · 1d ago
Not alive, but not dead: disembodied human brains used for drug testing
Timofeibu · Hacker News · 19h ago