A misalignment taxonomy

·LessWrong··

I am going to discuss five kinds of inner misalignment and two kinds of outer misalignment, which create a simple taxonomy of alignment failure modes. When I talk about a kind of misalignment here, I am talking about a reason for misalignment (like inner/outer misalignment), not a kind of misaligned agent (like a schemer versus a fitness-seeker), although they can be related. It is possible that multiple of these failure modes could occur in unison; I am attempting to describe independent, but p...

Read full article →

Related Articles

Google Hits 50% IPv6
barqawiz · Hacker News · 4h ago
Linux eliminates the strncpy API after six years of work, 360 patches
simonpure · Hacker News · 15h ago
Bun has an open PR adding shared-memory threads to JavaScriptCore
gr4vityWall · Hacker News · 19h ago
Hyundai buys Boston Dynamics
ck2 · Hacker News · 1d ago
Slow breathing modulates brain function and risk behavior
croes · Hacker News · 14h ago