A draft honesty policy for credible communication with AI systems

·EA Forum··

Published on May 6, 2026 6:49 PM GMTThis is a rough research note – we’re sharing it for feedback and to spark discussion. We’re less confident in its methods and conclusions.ContextWe think that it would be very good if human institutions could credibly communicate with advanced AI systems. This could enable positive-sum trade between humans and AIs instead of conflict that leaves everyone worse-off.[1] We want models to be able to trust companies when they make an honest offer or share informa...

Read full article →

Related Articles

Sequent: scale and automation for higher confidence in alignment
Geoffrey Irving · Alignment Forum · 25d ago
My research: a computational cognitive neuroscience perspective on alignment
Seth Herd · Alignment Forum · 1mo ago
A Pipeline for Generating Synthetic Sabotage Trajectories to Red-Team Monitors
Myles H · LessWrong · 1mo ago
Announcing the ARC White-Box Estimation Challenge
Jacob_Hilton · Alignment Forum · 1mo ago
The Case for Evaluating Model Behaviors
jsteinhardt · Alignment Forum · 1mo ago