Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)

Steven Byrnes·LessWrong·Community·May 11, 2026

1.1 Tl;drAlignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people’s agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. But that last one—manipulation—points to a challenge for all these desiderata: a human’s goals are themselves under-determined and manipulable, and it’s awfully hard to pin down a principled distinction between changing people’s goals in a good way (“providing counsel”, “prov...

Read full article →

Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)

Related Articles