The Transformer Family Version 2.0

Lilian Weng·Lilian Weng·AI·January 27, 2023

Many new Transformer architecture improvements have been proposed since my last post on “The Transformer Family” about three years ago. Here I did a big refactoring and enrichment of that 2020 post — restructure the hierarchy of sections and improve many sections with more recent papers. Version 2.0 is a superset of the old version, about twice the length. Notations Symbol Meaning $d$ The model size / hidden state dimension / positional encoding size. $h$ The number of heads in multi-head attent...

Read full article →

The Transformer Family Version 2.0

Related Articles