AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?

Ranit Karmakar, Jayita Chatterjee·ArXiv cs.AI·AI·May 4, 2026

arXiv:2605.00334v1 Announce Type: new Abstract: Production agentic systems make many model calls per user request, and most of those calls are short, structured, and routine. This raises a practical routing question that existing evaluations do not directly answer: which parts of an agent workflow truly require large frontier intelligence, and which can be handled by smaller models? We introduce AgentFloor, a deterministic 30-task benchmark organized as a six-tier capability ladder, spanning ins...

Read full article →

AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?

Related Articles