A misalignment taxonomy
I am going to discuss five kinds of inner misalignment and two kinds of outer misalignment, which create a simple taxonomy of alignment failure modes. When I talk about a kind of misalignment here, I am talking about a reason for misalignment (like inner/outer misalignment), not a kind of misaligned agent (like a schemer versus a fitness-seeker), although they can be related. It is possible that multiple of these failure modes could occur in unison; I am attempting to describe independent, but p...
Read full article →