Self-fulfilling misalignment?
From Anthropic: We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. And here is Alex Turner on the topic of self-fulfilling misalignment. I raised this possibility some while ago in a Free Press column, and mainly was met with hostility. The social return to a positive world view, and avoiding negative emotional contagion, never has been higher. The post Self-fulf...
Read full article →