VFUSE: Virulent Feature Understanding With Sparse AutoEncoders

·LessWrong··

AbstractGenerative models have shown remarkable progress in a variety of domains such as protein design, but such power enables the opaque generation of hazardous proteins. In this work, we introduce VFUSE (Virulent Feature Understanding with Sparse autoEncoders), a mechanistic interpretability approach that trains SAEs on diffusion-transformer activations to audit protein models for hazard-aware features. We apply VFUSE to RoseTTAFold3 and RFDiffusion3, popular open-weight models for protein fo...

Read full article →

Related Articles

A 'cold blob' in the Atlantic could be a sign of AMOC shutdown
tambourine_man · Hacker News · 16h ago
Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model
unrvl22 · Hacker News · 15h ago
Noise infusion banned from statistical products published by Census Bureau
nl · Hacker News · 1d ago
Yserver: A modern X11 server written in Rust
Venn1 · Hacker News · 11h ago
The hallucinogenic mushroom that contains no known psychedelic
thunderbong · Hacker News · 6h ago