Component-Aware Self-Speculative Decoding in Hybrid Language Models

Hector Borobia, Elies Segu\'i-Mas, Guillermina Tormo-Carb\'o·ArXiv cs.CL·AI·May 5, 2026

arXiv:2605.01106v1 Announce Type: new Abstract: Speculative decoding accelerates autoregressive inference by drafting candidate tokens with a fast model and verifying them in parallel with the target. Self-speculative methods avoid the need for an external drafter but have been studied exclusively in homogeneous Transformer architectures. We introduce component-aware self-speculative decoding, the first method to exploit the internal architectural heterogeneity of hybrid language models, isolati...

Read full article →

Component-Aware Self-Speculative Decoding in Hybrid Language Models

Related Articles