Three types of model organism
This is a short post to explain a distinction between three different types of model organism (MO) research:TypePurposeExampleWorst-case model organismsStress-test safety and control techniques by making the problem as hard as possiblePassword-locked models for capability elicitation; sleeper agents for stress-testing alignment training; red-team malign inits in controlNatural model organismsDemonstrate plausible emergence of failure modes in realistic training pipelinesEmergent misalignment ind...
Read full article →