Model Spec Midtraining: Improving How Alignment Training Generalizes

Chloe Li·LessWrong·AI Safety·May 5, 2026

tl;dr We introduce model spec midtraining (MSM): after pre-training but before alignment fine-tuning, we train models on synthetic documents discussing their Model Spec, teaching them how they should behave and why. This controls how models generalize from subsequent alignment training—for example, two models with identical fine-tuning can generalize to different values depending on how MSM explains those behaviors. We use MSM to substantially reduce agentic misalignment and study which Model Sp...

Read full article →

Model Spec Midtraining: Improving How Alignment Training Generalizes

Related Articles