Alignment as Equilibrium Design
Much of the alignment literature starts with the question of what are “human values”, “ethical behavior”, or “morality”, and how we can get models to act in accordance with them. This is an important question, but we argue that it can obscure a more fundamental technical problem of AI alignment.There is another perspective on alignment, rooted not in moral philosophy but in economics and mechanism design[1]. It originates in the study of human alignment to human values through incentives and cor...
Read full article →