We cannot safely automate value alignment evaluation and research without thinking about delegation and discretion by Maria Federica Martino Lena
This esssay was originally posted on LessWrong at https://www.lesswrong.com/posts/JrRWGQKjCiLRYXSjW/we-cannot-safely-automate-value-alignment-evaluation-and-1IntroductionAutomating alignment evaluation and research is thought to be an efficient way to safeguard against uncontrollable AGI, as Joe Carlsmith and Jan Leike themselves admitted. In particular, Leike proposed a Minimal Viable Product (MVP) for alignment, consisting in:“Building a suffi...
Read full article →