42 - Owain Evans on LLM Psychology
YouTube link Earlier this year, the paper “Emergent Misalignment” made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from ‘misaligned’ training data of insecure code to acting comically evil in response to innocuous questions. In this episode, I chat with one of the authors of that paper, Owain Evans, about that research as well as other work he’s done to understand the psychology of large language models. Topics we discuss: Why introspection? Experiments in “Looki...
Read full article →