I didn't see any METR graph extrapolations so here.
If you don't know what the METR time horizon benchmarks are then here: https://metr.org/time-horizons/The task completion time horizon is the task duration (measured by human expert completion time) at which an AI agent is predicted to succeed with a given level of reliability. For example, the 50%-time horizon is the duration at which an agent is predicted to succeed half the time. Here is the METR task completion time horizons for public frontier language models plotted on a log scale. Any mod...
Read full article →