Three types of model organism

·LessWrong··

This is a short post to explain a distinction between three different types of model organism (MO) research:TypePurposeExampleWorst-case model organismsStress-test safety and control techniques by making the problem as hard as possiblePassword-locked models for capability elicitation; sleeper agents for stress-testing alignment training; red-team malign inits in controlNatural model organismsDemonstrate plausible emergence of failure modes in realistic training pipelinesEmergent misalignment ind...

Read full article →

Related Articles

US bans differential privacy in Census data
nl · Hacker News · 2h ago
Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 Packages
qwertox · Hacker News · 4h ago
Twenty One Zero-Days in FFmpeg
redbell · Hacker News · 18h ago
CRISPR tech selectively shreds cancer cells, including "undruggable" cancers
gmays · Hacker News · 1d ago
Kimi K2.7-Code: open-source coding model with better token efficiency
nekofneko · Hacker News · 1d ago