Can Large Language Models Identify Novel Threats? Part 1: Mirror Life and the Classification Gap
[Cross-posted from On Failure States. This is Part 1 of an independent AI safety research series examining LLM safety behavior on unclassified emerging threats.]Can an LLM refuse a harmful uplift request when the topic in question hasn’t been identified as dangerous yet? In 2022, mirror RNA polymerase was actually created, a key step towards the creation of mirror life, and in 2024 the scientific community warned against any further research on it.[1][2] Having said that, mirror life is not curr...
Read full article →