We cannot safely automate value alignment evaluation and research without thinking about delegation and discretion by Maria Federica Martino Lena

·Nuno Sempere··

This es­s­say was origi­nally posted on LessWrong at https://​​www.less­wrong.com/​​posts/​​JrRWGQKjCiLRYXSjW/​​we-can­not-safely-au­to­mate-value-al­ign­ment-eval­u­a­tion-and-1IntroductionAu­tomat­ing al­ign­ment eval­u­a­tion and re­search is thought to be an effi­cient way to safe­guard against un­con­trol­lable AGI, as Joe Car­l­smith and Jan Leike them­selves ad­mit­ted. In par­tic­u­lar, Leike pro­posed a Min­i­mal Vi­able Product (MVP) for al­ign­ment, con­sist­ing in:“Build­ing a suffi­...

Read full article →

Related Articles

Landmark new METR report: Can AIs already start ‘rogue deployments’ inside AI companies? by 80000_Hours
80000_Hours · Nuno Sempere · 22h ago
Will the next full gemini model be frontier at coding?
Ian Shea · Manifold Markets · 1d ago
Will there be more than 100 cases of ebola in the US in 2026?
Joseph Caissie · Manifold Markets · 2d ago
Robots reliably do my laundry by?
Mochi · Manifold Markets · 2d ago
Will the next full gemini model be as good as opus 4.7 or gpt 5.5 at coding?
Ian Shea · Manifold Markets · 2d ago