Training Deliberative Monitors for Black-Box Scheming Detection

aksh-n·LessWrong·Community·June 4, 2026

Paper: https://arxiv.org/abs/2605.29601Thread: https://x.com/aksh_n0/status/2062568855814193497TL;DR: Training small open-weight monitors provides a cost-effective alternative to prompted frontier monitors. Applying our training recipe to Qwen3.5-27B results in a monitor better at scheming detection than all smaller prompted monitors (Gemini 3.1 Flash-Lite, GPT-5.4 Nano, Claude Haiku 4.5) and Gemini 2.5 Pro, while achieving lower marginal inference cost. Stronger prompted frontier monitors (Gemi...

Read full article →

Training Deliberative Monitors for Black-Box Scheming Detection

Related Articles