Probing the loss-band sparsity assumption in Scientist AI

Alejandro Tlaie·LessWrong·Community·July 5, 2026

Epistemic status: ~1 hour of compute on a T4. Note that I just tried with one model, and one subspace. The volume finding seems solid; the curvature finding is suggestive and I checked whether it generalises (it doesn't clearly). I think the methodology is the main interesting idea, and the specific numbers are a starting point. Notebook here. Feedback very welcome. What this is about Bengio et al. (2026), "Safety from Honesty in a Disinterested AI Predictor", propose a predictor (Scientist AI, ...

Read full article →

Probing the loss-band sparsity assumption in Scientist AI

Related Articles