Evaluating Offline Monitoring of Internal AI Agents

Frederik Hytting Jørgensen·LessWrong·Community·June 28, 2026

This work was conducted during the GovAI Winter Fellowship 2026. Full reportExecutive SummaryFrontier AI companies use offline monitoring to address risks from internally deployed AI agents. AI developers increasingly rely on AI agents for internal work, including for safety research and model training. At the same time, these companies are concerned that a misaligned model could exploit this access to take concerning actions, such as sabotaging efforts to understand the risks posed by AI. To id...

Read full article →

Evaluating Offline Monitoring of Internal AI Agents

Related Articles