When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability

·LessWrong··

We've found a method that tells you:How functionally similar two neural networks are across ALL inputs,Computed solely from the weights (i.e. no data),Using a principled generalization of cosine similarity.There's only one catch: you have to use a tensor network.We've already shown that tensor-transformer variants are performant (this isn't a novel claim, see these papers for MLPs and Attention), so here we're focusing on the interpretability advances. Linear Algebra Applies to TensorsA tensor n...

Read full article →

Related Articles

US bans differential privacy in Census data
nl · Hacker News · 2h ago
Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 Packages
qwertox · Hacker News · 4h ago
Twenty One Zero-Days in FFmpeg
redbell · Hacker News · 18h ago
CRISPR tech selectively shreds cancer cells, including "undruggable" cancers
gmays · Hacker News · 1d ago
Kimi K2.7-Code: open-source coding model with better token efficiency
nekofneko · Hacker News · 1d ago