Mechanistic estimation for wide random MLPs

Jacob_Hilton·LessWrong·Community·May 7, 2026

This post covers joint work with Wilson Wu, George Robinson, Mike Winer, Victor Lecomte and Paul Christiano. Thanks to Geoffrey Irving and Jess Riedel for comments on the post. In ARC's latest paper, we study the following problem: given a randomly initialized multilayer perceptron (MLP), produce an estimate for the expected output of the model under Gaussian input. The usual approach to this problem is to sample many possible inputs, run them all through the model, and take the average. Instead...

Read full article →

Mechanistic estimation for wide random MLPs

Related Articles