Automated Alignment is Harder Than You Think
SummaryThis is a summary of a paper published by the alignment team at UK AISI. Read the full paper here.AI research agents may help solve ASI alignment, for example via the following plan:Build agents that can do empirical alignment work (e.g.~writing code, running experiments, designing evaluations and red teaming) and confirm they are not scheming.[1]Use these agents to build increasingly sophisticated empirical safety cases for each successive generation of agents, gradually automating more ...
Read full article →