The Case for Evaluating Model Behaviors

·Alignment Forum··

Most evaluations of AI systems focus on their capabilities: how good they are at coding tasks, how effectively they can answer complex scientific questions, and so on.From a safety perspective, capability evaluations have a place: by understanding how close we are to different capabilities, and the rate of progress on them, we can forecast when different risks are likely to occur, as well as the broad shape of AI development. These capability evaluations were very useful to me when writing GPT-2...

Read full article →

Related Articles

Probing the loss-band sparsity assumption in Scientist AI
Alejandro Tlaie · LessWrong · 53m ago
SFT Drives Gemini’s Safety Properties
Josh Engels · Alignment Forum · 22d ago
Sequent: scale and automation for higher confidence in alignment
Geoffrey Irving · Alignment Forum · 25d ago
A Mike's-Eye View of ARC's Research
Michael Winer · ARC · 25d ago
My research: a computational cognitive neuroscience perspective on alignment
Seth Herd · Alignment Forum · 1mo ago