A benchmark is a sensor

·LessWrong··

The simple mental pictureA simple mental picture we have for an AI capability benchmark is to think of it as a sensor with a certain sensitivity within a certain range of capabilities. The sensitivity of a benchmark, i.e. it's ability to distinguish the capability of different models, is given by a curve like this: The curve starts high (low sensitivity, high uncertainty), since for models with low capability all the tasks in the benchmark are too hard, and the benchmark can't distinguish betwee...

Read full article →

Related Articles

Dirtyfrag: Universal Linux LPE
flipped · Hacker News · 1d ago
A web page that shows you everything the browser told it without asking
mwheelz · Hacker News · 13h ago
DeepSeek 4 Flash local inference engine for Metal
tamnd · Hacker News · 1d ago
An Introduction to Meshtastic
ColinWright · Hacker News · 14h ago
Natural Language Autoencoders: Turning Claude's Thoughts into Text
instagraham · Hacker News · 1d ago