Risk reports need to address deployment-time spread of misalignment

Alex Mallen·LessWrong·Community·May 15, 2026

Risk reports commonly use pre-deployment alignment assessments to measure misalignment risk from an internally deployed AI. However, an AI that genuinely starts out with largely benign motivations can develop widespread dangerous motivations during deployment. I think this is the most plausible route to consistent adversarial misalignment in the near future. So, AI companies and evaluators should substantively incorporate it into risk analysis and planning.In this post, I’ll briefly argue why, a...

Read full article →

Risk reports need to address deployment-time spread of misalignment

Related Articles