Automated Alignment is Harder Than You Think

Aleksandr Bowkis·LessWrong·Community·May 14, 2026

SummaryThis is a summary of a paper published by the alignment team at UK AISI. Read the full paper here.AI research agents may help solve ASI alignment, for example via the following plan:Build agents that can do empirical alignment work (e.g.~writing code, running experiments, designing evaluations and red teaming) and confirm they are not scheming.[1]Use these agents to build increasingly sophisticated empirical safety cases for each successive generation of agents, gradually automating more ...

Read full article →

Automated Alignment is Harder Than You Think

Related Articles