A draft honesty policy for credible communication with AI systems

·EA Forum··

Published on May 6, 2026 6:49 PM GMTThis is a rough research note – we’re sharing it for feedback and to spark discussion. We’re less confident in its methods and conclusions.ContextWe think that it would be very good if human institutions could credibly communicate with advanced AI systems. This could enable positive-sum trade between humans and AIs instead of conflict that leaves everyone worse-off.[1] We want models to be able to trust companies when they make an honest offer or share informa...

Read full article →

Related Articles

Training Model to Predict Its Own Generalization: A Preliminary Study
Tianyi (Alex) Qiu · LessWrong · 3d ago
A Theoretical Game of Attacks via Compositional Skills
Xinbo Wu, Huan Zhang, Abhishek Umrawal, Lav R. Varshney · ArXiv cs.CL · 3d ago
BioVeil MATRIX: Uncovering and categorizing vulnerabilities of agentic biological AI scientists
Kimon Antonios Provatas, Avery Self, Ioannis Mouratidis, Ilias Georgakopoulos-Soares · ArXiv q-bio · 3d ago
Irretrievability; or, Murphy's Curse of Oneshotness upon ASI
Eliezer Yudkowsky · LessWrong · 4d ago
Verbalized Eval Awareness Inflates Measured Safety
Santiago Aranguri · LessWrong · 4d ago