43 - David Lindner on Myopic Optimization with Non-myopic Approval
YouTube link In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human’s sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Find out below. Topics we discuss: What MONA is? How MONA deals with reward hacking Failure cases for MONA MONA...
Read full article →