Survival over Scrutiny: Mapping the Breakdown of Constitutional Alignment

Nishka Katoch·LessWrong·AI Safety·May 6, 2026

A first exploration into AI safety: I gave a reasoning agent a financial incentive to evade oversight, a constitution telling it not to, and 150 steps to figure out the rest.Cross-posted from my personal blog.How This StartedI've been reading about AI safety for a while (instrumental convergence, specification gaming, deceptive alignment) and I was curious about what happens when an agent has to make decisions over many steps, with real consequences accumulating over time, inside an environment ...

Read full article →

Survival over Scrutiny: Mapping the Breakdown of Constitutional Alignment

Related Articles