Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding

Lehan Pan, Ziyang Tao, Ruoyu Pang, Xiao Wang, Jianjun Zhao, Yanyong Zhang·ArXiv cs.CL·AI·May 4, 2026

arXiv:2605.00342v1 Announce Type: new Abstract: Tree-based speculative decoding accelerates autoregressive generation by verifying multiple draft candidates in parallel, but this advantage weakens for sparse Mixture-of-Experts (MoE) models. As the draft tree grows, different branches activate different experts, expanding the union of activated experts and substantially increasing target-side verification cost. We propose EVICT, a training-free, hyperparameter-free, and lossless adaptive verifica...

Read full article →

Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding

Related Articles