Emergent introspection does not replicate on Llama-3.1-405B
TL;DR: I couldn't replicate Lindsey (2026) (LW post) on Llama-3.1-405B-Instruct (0/400 trials, 0/80 false positives in the canonical sweep; null robust to layer ∈ {70, 84, 90, 100} and to a dense magnitude grid at the suspected crossover). So: how does introspection emerge in models? The optimistic reading of Lindsey's result ("post-trained LLMs at sufficient scale can introspect") doesn't hold up unless Opus is much larger than 405B (Anthropic doesn't disclose its size) or Anthropic's post-trai...
Read full article →