The Case for Evaluating Model Behaviors

·Alignment Forum··

Most evaluations of AI systems focus on their capabilities: how good they are at coding tasks, how effectively they can answer complex scientific questions, and so on.From a safety perspective, capability evaluations have a place: by understanding how close we are to different capabilities, and the rate of progress on them, we can forecast when different risks are likely to occur, as well as the broad shape of AI development. These capability evaluations were very useful to me when writing GPT-2...

Read full article →

Related Articles

Loss of Oversight: How AI Systems May Become Harder to Audit, Monitor, and Investigate
Jordan Taylor · LessWrong · 35m ago
Mechanistic estimation for expectations of random products
Jacob Hilton · ARC · 5d ago
Multipolar Civilisation Depends on Maintaining an Attacker’s Dilemma
Naci Cankaya · LessWrong · 14d ago
Using Base-LCM to Monitor LLMs
Éloïse Benito-Rodriguez · LessWrong · 14d ago
Will Claude cause the next Covid?
Kate Delbeke · LessWrong · 14d ago