Can Large Language Models Identify Novel Threats? Part 1: Mirror Life and the Classification Gap

·LessWrong··

[Cross-posted from On Failure States. This is Part 1 of an independent AI safety research series examining LLM safety behavior on unclassified emerging threats.]Can an LLM refuse a harmful uplift request when the topic in question hasn’t been identified as dangerous yet? In 2022, mirror RNA polymerase was actually created, a key step towards the creation of mirror life, and in 2024 the scientific community warned against any further research on it.[1][2] Having said that, mirror life is not curr...

Read full article →

Related Articles

An OpenAI model has disproved a central conjecture in discrete geometry
tedsanders · Hacker News · 2d ago
Deno 2.8
roflcopter69 · Hacker News · 16h ago
Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark
jetter · Hacker News · 17h ago
Sleep research led to a new sleep apnea drug
colinprince · Hacker News · 5h ago
Project Hail Mary – Stellar Navigation Chart
speleo · Hacker News · 1d ago