Can public chat data predict real-world AI misalignments?
This is an unofficial automated linkpost. Frontier AI models are increasingly used in settings with real economic, legal, and societal consequences. As a result, governments, AI safety organizations and independent researchers need ways to evaluate how these systems behave under realistic conditions. Traditional evaluations use hand-written, synthetic, or adversarial prompts to stress-test known risks and compare models under controlled conditions. But these prompts can be narrow, unrepresentati...
Read full article →