When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability

Logan Riggs·LessWrong·Community·May 29, 2026

We've found a method that tells you:How functionally similar two neural networks are across ALL inputs,Computed solely from the weights (i.e. no data),Using a principled generalization of cosine similarity.There's only one catch: you have to use a tensor network.We've already shown that tensor-transformer variants are performant (this isn't a novel claim, see these papers for MLPs and Attention), so here we're focusing on the interpretability advances. Linear Algebra Applies to TensorsA tensor n...

Read full article →

When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability

Related Articles