40 - Jason Gross on Compact Proofs and Interpretability

·AXRP (Daniel Filan)··

YouTube link How do we figure out whether interpretability is doing its job? One way is to see if it helps us prove things about models that we care about knowing. In this episode, I speak with Jason Gross about his agenda to benchmark interpretability in this way, and his exploration of the intersection of proofs and modern machine learning. Topics we discuss: Why compact proofs Compact Proofs of Model Performance via Mechanistic Interpretability What compact proofs look like Structureless nois...

Read full article →

Related Articles

Accelerating Gemma 4: faster inference with multi-token prediction drafters
amrrs · Hacker News · 3d ago
ProgramBench: Can language models rebuild programs from scratch?
jonbaer · Hacker News · 1d ago
ZAYA1-8B matches DeepSeek-R1 on math with less than 1B active parameters
steveharing1 · Hacker News · 1d ago
OpenAI’s o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors
donsupreme · Hacker News · 6d ago
A couple million lines of Haskell: Production engineering at Mercury
unignorant · Hacker News · 6d ago