40 - Jason Gross on Compact Proofs and Interpretability

Daniel Filan·AXRP (Daniel Filan)·AI·March 28, 2025

YouTube link How do we figure out whether interpretability is doing its job? One way is to see if it helps us prove things about models that we care about knowing. In this episode, I speak with Jason Gross about his agenda to benchmark interpretability in this way, and his exploration of the intersection of proofs and modern machine learning. Topics we discuss: Why compact proofs Compact Proofs of Model Performance via Mechanistic Interpretability What compact proofs look like Structureless nois...

Read full article →

40 - Jason Gross on Compact Proofs and Interpretability

Related Articles