Scaling Laws, Carefully

·Hacker News··

Scaling laws are one of the most critical empirical findings in deep learning. The observation is simple in form: the training loss $L$ decreases predictably as we scale up model size $N$, dataset size $D$, and compute $C$, following a power-law curve, which appears as a straight line on a log-log plot. We can view scaling laws as a framework for describing the relationship between compute, loss, model size and data; at its core, it is about how to allocate precious compute optimally between $N$

Read full article →

Related Articles

Claude Code is steganographically marking requests
kirushik · Hacker News · 13h ago
Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5
Pragmata · Hacker News · 5h ago
County with 37 Data Centers Asks Schools to 'Conserve Electricity'
01-_- · Hacker News · 13h ago
From brain waves to words: a new path to communication without surgery
alok-g · Hacker News · 7h ago
I ported Kubernetes to the browser
peterdemin · Hacker News · 8h ago