Probing the loss-band sparsity assumption in Scientist AI

·LessWrong··

Epistemic status: ~1 hour of compute on a T4. Note that I just tried with one model, and one subspace. The volume finding seems solid; the curvature finding is suggestive and I checked whether it generalises (it doesn't clearly). I think the methodology is the main interesting idea, and the specific numbers are a starting point. Notebook here. Feedback very welcome. What this is about Bengio et al. (2026), "Safety from Honesty in a Disinterested AI Predictor", propose a predictor (Scientist AI, ...

Read full article →

Related Articles

“Beyond the limit”: Satellites and mirrors in space pose threat to the night sky
Breadmaker · Hacker News · 1d ago
Solar rail could become common in Europe after successful trial in Switzerland
neilfrndes · Hacker News · 3h ago
GPT-5.5 Codex reasoning-token clustering may be leading to degraded performance
maille · Hacker News · 20h ago
Potential session/cache leakage between workspace instances or consumer accounts
chatmasta · Hacker News · 1d ago
Show HN: KiCad in the Browser
ViktorEE · Hacker News · 5h ago