My research: a computational cognitive neuroscience perspective on alignment

·Alignment Forum··

Note - title edited to be more descriptive.This is a summary of the work I've done and work I plan to do, and the theories of change and AI progress that motivate my work. I've been working full-time on alignment for three years and change, and thinking about brainlike AGI and its alignment increasingly often since 2004. Here's the research agenda in one breath: I'm trying to predict what the first transformative AI will be, in enough mechanistic detail that we can predict likely failure modes o...

Read full article →

Related Articles

SFT Drives Gemini’s Safety Properties
Josh Engels · Alignment Forum · 1h ago
Sequent: scale and automation for higher confidence in alignment
Geoffrey Irving · Alignment Forum · 3d ago
A Mike's-Eye View of ARC's Research
Michael Winer · ARC · 3d ago
A Pipeline for Generating Synthetic Sabotage Trajectories to Red-Team Monitors
Myles H · LessWrong · 9d ago
Announcing the ARC White-Box Estimation Challenge
Jacob_Hilton · Alignment Forum · 11d ago