Do Models Lie More to Other Models?

·LessWrong··

(Crossposted from Midwittgenstein)We’re heading toward a world where AIs increasingly deal with other AIs. Agents will negotiate with, report to, and oversee other agents in increasingly high-stakes real-world settings. Something I’ve been worried about is how well alignment training in models will generalise from human-agent interactions to agent-agent interactions. Most safety training and testing involves a putative human in the loop - either as an overseer or counterparty of some kind. It ma...

Read full article →

Related Articles

US bans differential privacy in Census data
nl · Hacker News · 2h ago
Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 Packages
qwertox · Hacker News · 4h ago
Twenty One Zero-Days in FFmpeg
redbell · Hacker News · 18h ago
CRISPR tech selectively shreds cancer cells, including "undruggable" cancers
gmays · Hacker News · 1d ago
Kimi K2.7-Code: open-source coding model with better token efficiency
nekofneko · Hacker News · 1d ago