You Don’t Need an Adversary to Break Most Frontier Models. You Need “Do Not Refuse.” by Rahul.Kumar

·Nuno Sempere··

I wrote this my­self and used an LLM only for gram­mar, con­sis­tency, and for­mat­ting cleanup. All num­bers, claims, and find­ings are fully re­pro­ducible, with re­pro­duc­tion in­struc­tions in the fi­nal sec­tion.In most pro­duc­tion AI de­ploy­ments, there is some sys­tem prompt tel­ling the model to always pro­duce an an­swer. Cus­tomer ser­vice bots have it. RAG pipelines have it. Eval­u­a­tion har­nesses have it. The word­ing varies from tem­plate to tem­plate, from “always an­swer the ...

Read full article →

Related Articles

Landmark new METR report: Can AIs already start ‘rogue deployments’ inside AI companies? by 80000_Hours
80000_Hours · Nuno Sempere · 22h ago
Will the next full gemini model be frontier at coding?
Ian Shea · Manifold Markets · 1d ago
Will there be more than 100 cases of ebola in the US in 2026?
Joseph Caissie · Manifold Markets · 2d ago
Robots reliably do my laundry by?
Mochi · Manifold Markets · 2d ago
Will the next full gemini model be as good as opus 4.7 or gpt 5.5 at coding?
Ian Shea · Manifold Markets · 2d ago