Some subtypes of taskishness / corrigibility

Tetraspace·LessWrong·Community·June 27, 2026

"Corrigibility" is somewhat of an overloaded term in alignment - it points in the direction of a cluster of desirable properties, but different people have different ideas of what this entails.I think of "corrigibility", as it is used, to cover a few different ideas. I will name some of these and sort them roughly in order of how much of the good outcomes from deploying such a system are in the hands of the AI, rather than the human operator.Sponge corrigibility - The AI is corrigible and follow...

Read full article →

Some subtypes of taskishness / corrigibility

Related Articles