Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

Elliott Thornley (EJT)·LessWrong·Community·June 3, 2026

Summary Misaligned artificial agents might resist shutdown. One proposed solution is the POST-Agents Proposal: roughly, training agents to lack preferences between different-length trajectories. The Discounted Reward for Same-Length Trajectories (DReST) reward does this by penalizing agents for repeatedly choosing same-length trajectories. It thus incentivizes agents to be: NEUTRAL about trajectory-lengths: choose stochastically between different trajectory-lengths. USEFUL: pursue goals effectiv...

Read full article →

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

Related Articles