When capabilities work is the *safe* bet

·LessWrong··

If you believe that LLMs lend themselves unusually well to alignment compared to other regimes, this can be a very good reason to start doing capability research on them rather than LLM safety research. Imagine you have these beliefs about how AI goes: mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space="1"] { margin-left: .111em; } mjx-container [space="2"] { margin-left: .167em; } mjx-container [space="3"] { margin-left: .222em; } mjx-container [space="4"] { margin-left: .278em...

Read full article →

Related Articles

Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5
Pragmata · Hacker News · 23h ago
Claude Code is steganographically marking requests
kirushik · Hacker News · 1d ago
Asahi Linux 7.1 Progress Report
pantalaimon · Hacker News · 13h ago
Single Dose of Frog-Derived Gut Bacterium Eradicates 100% of Tumors in Mice
mpweiher · Hacker News · 13h ago
The first early human eggs from stem cells
dsr12 · Hacker News · 18h ago