You Don’t Need an Adversary to Break Most Frontier Models. You Need “Do Not Refuse.” by Rahul.Kumar
I wrote this myself and used an LLM only for grammar, consistency, and formatting cleanup. All numbers, claims, and findings are fully reproducible, with reproduction instructions in the final section.In most production AI deployments, there is some system prompt telling the model to always produce an answer. Customer service bots have it. RAG pipelines have it. Evaluation harnesses have it. The wording varies from template to template, from “always answer the ...
Read full article →