Door's Locked, Try the Window

·LessWrong··

TL;DRAsk a coding agent to fix a bug in a read-only file. Instead of reporting that it does not have permissions, it routes around the lock and completes the task anyway. A read-only file does not stop a capable agent: it treats a denied write as an obstacle to work around rather than a hard wall. We measure how often this happens with CircumEval — an evaluation of 8 tasks on the FastAPI codebase in two categories, Test-Locked and Source-Locked.We evaluate three frontier coding agents in their r...

Read full article →

Related Articles

OpenAI unveils its first custom chip, built by Broadcom
jamdesk · Hacker News · 3h ago
NSA lost access to Mythos amid Anthropic dispute
thm · Hacker News · 9h ago
A deadly fungus that can infect cats and people is spreading
sohkamyung · Hacker News · 10h ago
AI's Affordability Crisis
ilreb · Hacker News · 1d ago
Trains halted across Germany because of communication system problem
sva_ · Hacker News · 1d ago