Reinforcement learning scaling might incentivise hidden reasoning architectures for AI

Oliver Sourbut·LessWrong·Community·May 10, 2026

In short: the transformer architecture brought massive scale to AI, and also provided partial guarantees of ‘reasoning out loud’, an unprecedentedly interpretable situation for AI. Reinforcement learning (RL) may be less compatible with the transformer architecture, and RL is being scaled up at the frontier of AI. So we might see the end of the ‘reasoning out loud’ era for AI.This is part 2 of a series about LLM architecture and some implications for reasoning and transparency. Part 1 explained ...

Read full article →

Reinforcement learning scaling might incentivise hidden reasoning architectures for AI

Related Articles