How useful is cross-domain generalization for training LLM monitors?

·LessWrong··

We study how well single-token classification training generalizes and find that:Control monitor performance can be improved by training on adjacent classification tasks (which is useful if we don't have high-quality in-domain data).We do not find strong benefits from classification-specialized models: training on instruction following after classification training not only keeps the uplift from classification training, but it also mitigates some generalization failures that arise from training ...

Read full article →

Related Articles

An OpenAI model has disproved a central conjecture in discrete geometry
tedsanders · Hacker News · 20h ago
GitHub confirms breach of 3,800 repos via malicious VSCode extension
Timofeibu · Hacker News · 1d ago
Show HN: Rmux – A programmable terminal multiplexer with a Playwright-style SDK
shideneyu · Hacker News · 6h ago
Incident Report: May 19, 2026 – GCP Account Suspension
0xedb · Hacker News · 1d ago
Not alive, but not dead: disembodied human brains used for drug testing
Timofeibu · Hacker News · 19h ago