Learning Unmasking Policies for Diffusion Language Models

Apple ML Research·AI·July 2, 2026

Diffusion (Large) Language Models (dLLMs) now match the downstream performance of their autoregressive counterparts on many tasks, while holding the promise of being more efficient during inference. One critical design aspect of dLLMs is the sampling procedure that selects which tokens to unmask at each diffusion step. Indeed, recent work has found that heuristic strategies such as confidence thresholding improve both sample quality and token throughput compared to random unmasking. However, suc...

Read full article →

Learning Unmasking Policies for Diffusion Language Models

Related Articles