What I learned this week - Pretraining parallelisms, Can distillation be stopped, Mythos and the cybersecurity equilibrium, Pipeline RL, On why pretraining runs fails
At the end of my conversation with Michael Nielsen, we talked about how to actually retain what you learn. Michael’s advice was to make some kind of demanding artifact. Write something up. Try to explain it. So in that spirit, here are notes on some topics I’ve learned about over the last week or two. These notes are extremely rough, and have many mistakes.Can distillation be stopped?Can the frontier labs stop distillation? Because if they can’t, open source commoditizing models can catch up inc...
Read full article →