Research update: Towards a Law of Iterated Expectations for Heuristic Estimators

·ARC··

Last week, ARC released a paper called Towards a Law of Iterated Expectations for Heuristic Estimators, which follows up on previous work on formalizing the presumption of independence. Most of the work described here was done in 2023. A brief table of contents for this post: What is a heuristic estimator? (One example and three analogies.) How might heuristic estimators help with understanding neural networks? (Three potential applications.) Formalizing the principle of unpredictable errors for...

Read full article →

Related Articles

SFT Drives Gemini’s Safety Properties
Josh Engels · Alignment Forum · 22d ago
Sequent: scale and automation for higher confidence in alignment
Geoffrey Irving · Alignment Forum · 25d ago
A Mike's-Eye View of ARC's Research
Michael Winer · ARC · 25d ago
My research: a computational cognitive neuroscience perspective on alignment
Seth Herd · Alignment Forum · 1mo ago
A Pipeline for Generating Synthetic Sabotage Trajectories to Red-Team Monitors
Myles H · LessWrong · 1mo ago