Constitutional AI Alignment

·LessWrong··

Epistemological status: existential AI safety suggestions as a set of bullet pointsTL;DR We need to be clear about what behavior we’re trying to train when we align AIs — aligned behavior is not the default for an LLM whose behavior is distilled from ours, nor for an RL-trained maximizer. Anthropic published an early constitution, and recently a large update in Claude’s Constitution (a.k.a. “Soul Doc”). That includes not just what the AI should do, but explanations of how and why many decisions ...

Read full article →

Related Articles

US bans differential privacy in Census data
nl · Hacker News · 2h ago
Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 Packages
qwertox · Hacker News · 4h ago
Twenty One Zero-Days in FFmpeg
redbell · Hacker News · 18h ago
CRISPR tech selectively shreds cancer cells, including "undruggable" cancers
gmays · Hacker News · 1d ago
Kimi K2.7-Code: open-source coding model with better token efficiency
nekofneko · Hacker News · 1d ago