Predicting LLM Safety Before Release by Simulating Deployment

Tomek Korbak·LessWrong·Community·June 16, 2026

Paper linkBefore releasing a new model, labs need to understand not just what it can do, but how it is likely to behave in real-world use, including where it might introduce new risks. This becomes even more important as capabilities increase. As part of our pre-deployment safety review, we leverage targeted evaluations, red-teaming, and other checks to understand model behavior. We’ve now started using a method for simulating model deployments before they happen, which adds a complementary sign...

Read full article →

Predicting LLM Safety Before Release by Simulating Deployment

Related Articles