Power Laws in NNs: A Possible Mechanism for Inductive Bias towards Sparse Representations

·LessWrong··

This post was produced as part of the Iliad Fellowship under the mentorship of Dmitry Vaintrob. Tl;dr: Power-law ("heavy-tailed") distributions have universality theorems similar to those which make Gaussians common. We observe many things in ML are power-law distributed, most robustly and interestingly, the spectra of weight matrices. I explain how we can think of power-laws as being a natural generalization of the idea of 'sparsity', interpolating between true sparsity and Gaussianity accordin...

Read full article →

Related Articles

DSpark: Speculative decoding accelerates LLM inference [pdf]
aurenvale · Hacker News · 1d ago
Anthropic says Alibaba illicitly extracted Claude AI model capabilities
htrp · Hacker News · 3d ago
Choosing a Public DNS Resolver
pawal · Hacker News · 15h ago
Show HN: Decomp Academy – Learn to decompile GameCube games into matching C
jackpriceburns · Hacker News · 12h ago
Michigan spent $1.8B and only created 602 jobs
littlexsparkee · Hacker News · 15h ago