Deployment Awareness Matters More Than Evaluation Awareness

·LessWrong··

TL;DREvaluation awareness — an AI recognizing it's being evaluated — is a widely discussed concept in AI safety. But there is a closely related concept that we claim is more important: deployment awareness, the AI's ability to recognize when it is not being evaluated and when its actions matter. A misaligned AI with deployment awareness can game evaluations without any evaluation awareness at all, with a simple strategy: act aligned by default, and deviate only when confident you're in real depl...

Read full article →

Related Articles

Anthropic says Alibaba illicitly extracted Claude AI model capabilities
htrp · Hacker News · 2d ago
An entire Herculaneum scroll has been read for the first time
verditelabs · Hacker News · 1d ago
Framework's 10G Ethernet module exposes USB-C's complexity
Alupis · Hacker News · 1d ago
What happened after 2k people tried to hack my AI assistant
cuchoi · Hacker News · 22h ago
Ford AI hiccups push carmaker to rehire ‘gray beard’ inspectors
alanwreath · Hacker News · 1d ago