Tracking Difficulty with Feature Portfolios

·LessWrong··

Thanks to Megan Kinniment for helpful comments and discussion, and to Jean-Stanislas Denain for helpful comments and pointers to past work.TL;DR: We claim that useful task attributes for forecasting AI capabilities should be measurable, interpretable, stable in its trend over time, and sufficient to explain task difficulty. task.human_completion_time (human expert completion time, used in time horizons) probably isn't sufficient, and is also getting hard to measure as tasks lengthen. Towards suf...

Read full article →

Related Articles

“Beyond the limit”: Satellites and mirrors in space pose threat to the night sky
Breadmaker · Hacker News · 1d ago
GPT-5.5 Codex reasoning-token clustering may be leading to degraded performance
maille · Hacker News · 19h ago
Potential session/cache leakage between workspace instances or consumer accounts
chatmasta · Hacker News · 1d ago
EV Batteries Are Defying Expectations After Miles
apparent · Hacker News · 10h ago
Show HN: KiCad in the Browser
ViktorEE · Hacker News · 5h ago