How did ‘large’ language models get that way? The role of Transformers and Pretraining in GPT

·LessWrong··

Large language models are really large. They’re among the largest machine learning projects ever, and set to be (perhaps already are by some measures) some of the largest computing and even largest infrastructure projects ever.But how did LMs actually get so large as to warrant the title ‘large language model (LLM)’? A large part of the answer is in the P ('pretrained') and the T ('transformer') of GPT.This is part 1 of a series about LLM architecture and some implications, past and future, for ...

Read full article →

Related Articles

OpenAI’s o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors
donsupreme · Hacker News · 2mo ago
Accelerating Gemma 4: faster inference with multi-token prediction drafters
amrrs · Hacker News · 2mo ago
A couple million lines of Haskell: Production engineering at Mercury
unignorant · Hacker News · 2mo ago
Show HN: I trained a language model that thinks the capital of Japan is Paris
farisallafi · Hacker News · 9h ago
Using “underdrawings” for accurate text and numbers
samcollins · Hacker News · 2mo ago