Can ELK be brute-forced? Intertheoretic reduction

Q Home·LessWrong·Community·May 17, 2026

Eliciting Latent Knowledge problem for the unfamiliar:Suppose we train a model to predict what the future will look like according to cameras and other sensors. We then use planning algorithms to find a sequence of actions that lead to predicted futures that look good to us.But some action sequences could tamper with the cameras so they show happy humans regardless of what’s really happening. More generally, some futures look great on camera but are actually catastrophically bad.In these cases, ...

Read full article →

Can ELK be brute-forced? Intertheoretic reduction

Related Articles