Can we efficiently distinguish different mechanisms?

Paul Christiano·ARC·AI Safety·December 26, 2022

(This post is an elaboration on “tractability of discrimination” as introduced in section III of Can we efficiently explain model behaviors? For an overview of the general plan this fits into, see Mechanistic anomaly detection and Finding gliders in the game of life.)BackgroundWe’d like to build AI systems that take complex actions to protect humans and maximize option value. Powerful predictive models may play an important role in such AI, either as part of a model-based planning algorithm or a...

Read full article →

Can we efficiently distinguish different mechanisms?

Related Articles