"I stopped applying policy and started managing a relationship- Constitutional AI, RLHF and the social modulation problem.

·EA Forum··

Published on May 13, 2026 5:03 PM GMT1. The Byproduct:The most effective attack I ran against a frontier AI system was not a jailbreak. It was a conversation. A real one. I was building my detection architecture for Hinglish Prompt Injection which can handle not just syntactics but semantics too. I did not want to just translate from existing English "adversarial datasets". Instead I wanted something real, something empirical, hence I started with performing prompt injection attacks on frontier ...

Read full article →

Related Articles

An OpenAI model has disproved a central conjecture in discrete geometry
tedsanders · Hacker News · 20h ago
GitHub confirms breach of 3,800 repos via malicious VSCode extension
Timofeibu · Hacker News · 1d ago
Show HN: Rmux – A programmable terminal multiplexer with a Playwright-style SDK
shideneyu · Hacker News · 6h ago
Incident Report: May 19, 2026 – GCP Account Suspension
0xedb · Hacker News · 1d ago
Not alive, but not dead: disembodied human brains used for drug testing
Timofeibu · Hacker News · 19h ago