The Big LLM Architecture Comparison

Sebastian Raschka, PhD·Sebastian Raschka·AI·July 19, 2025

Last updated: Apr 2, 2026 (added Gemma 4 in section 23)It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek V3 and Llama 4 (2024-2025), one might be surprised at how structurally similar these models still are.Sure, positional embeddings have evolved from absolute to rotational (RoPE), Multi-Head Attention has largely given way to Grouped-Query Attention, and the more efficient SwiGLU has replaced activat...

Read full article →

The Big LLM Architecture Comparison

Related Articles