"I stopped applying policy and started managing a relationship- Constitutional AI, RLHF and the social modulation problem.

·EA Forum··

Published on May 13, 2026 5:03 PM GMT1. The Byproduct:The most effective attack I ran against a frontier AI system was not a jailbreak. It was a conversation. A real one. I was building my detection architecture for Hinglish Prompt Injection which can handle not just syntactics but semantics too. I did not want to just translate from existing English "adversarial datasets". Instead I wanted something real, something empirical, hence I started with performing prompt injection attacks on frontier ...

Read full article →

Related Articles

“Beyond the limit”: Satellites and mirrors in space pose threat to the night sky
Breadmaker · Hacker News · 1d ago
Solar rail could become common in Europe after successful trial in Switzerland
neilfrndes · Hacker News · 2h ago
GPT-5.5 Codex reasoning-token clustering may be leading to degraded performance
maille · Hacker News · 19h ago
Potential session/cache leakage between workspace instances or consumer accounts
chatmasta · Hacker News · 1d ago
Show HN: KiCad in the Browser
ViktorEE · Hacker News · 5h ago