What I learned this week - Pretraining parallelisms, Can distillation be stopped, Mythos and the cybersecurity equilibrium, Pipeline RL, On why pretraining runs fails

·Dwarkesh Patel··

At the end of my conversation with Michael Nielsen, we talked about how to actually retain what you learn. Michael’s advice was to make some kind of demanding artifact. Write something up. Try to explain it. So in that spirit, here are notes on some topics I’ve learned about over the last week or two. These notes are extremely rough, and have many mistakes.Can distillation be stopped?Can the frontier labs stop distillation? Because if they can’t, open source commoditizing models can catch up inc...

Read full article →

Related Articles

Accelerating Gemma 4: faster inference with multi-token prediction drafters
amrrs · Hacker News · 3d ago
ProgramBench: Can language models rebuild programs from scratch?
jonbaer · Hacker News · 1d ago
ZAYA1-8B matches DeepSeek-R1 on math with less than 1B active parameters
steveharing1 · Hacker News · 1d ago
OpenAI’s o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors
donsupreme · Hacker News · 6d ago
A couple million lines of Haskell: Production engineering at Mercury
unignorant · Hacker News · 6d ago