40 - Jason Gross on Compact Proofs and Interpretability

·AXRP (Daniel Filan)··

YouTube link How do we figure out whether interpretability is doing its job? One way is to see if it helps us prove things about models that we care about knowing. In this episode, I speak with Jason Gross about his agenda to benchmark interpretability in this way, and his exploration of the intersection of proofs and modern machine learning. Topics we discuss: Why compact proofs Compact Proofs of Model Performance via Mechanistic Interpretability What compact proofs look like Structureless nois...

Read full article →

Related Articles

OpenAI’s o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors
donsupreme · Hacker News · 2mo ago
Accelerating Gemma 4: faster inference with multi-token prediction drafters
amrrs · Hacker News · 2mo ago
A couple million lines of Haskell: Production engineering at Mercury
unignorant · Hacker News · 2mo ago
Show HN: I trained a language model that thinks the capital of Japan is Paris
farisallafi · Hacker News · 9h ago
Using “underdrawings” for accurate text and numbers
samcollins · Hacker News · 2mo ago