Do Models Lie More to Other Models?

keith_wynroe·LessWrong·Community·May 28, 2026

(Crossposted from Midwittgenstein)We’re heading toward a world where AIs increasingly deal with other AIs. Agents will negotiate with, report to, and oversee other agents in increasingly high-stakes real-world settings. Something I’ve been worried about is how well alignment training in models will generalise from human-agent interactions to agent-agent interactions. Most safety training and testing involves a putative human in the loop - either as an overseer or counterparty of some kind. It ma...

Read full article →

Do Models Lie More to Other Models?

Related Articles