Aligned Agents Still Build Misaligned Organisations

·Strange Loop Canon··

By now, we have plenty of examples of AI agent misalignment. They lie, they sometimes cheat, they break rules, they demonstrate odd preferences for self-preservation or against self-preservation. They reward hack! Quite a bit has been studied about them and much of these faults have been ameliorated enough that we use them all the time.But we’re starting to go beyond a single agent. We’re setting up multi-agent workflows. Agents are working with other agents, autonomously or semi-autonomously, t...

Read full article →

Related Articles

Welfare Biology and AI: The Psychopath, the Nematode, and the Arahant
Dawn Drescher · EA Forum · 4d ago
Immigration changes are driving foreign researchers to leave the U.S. — or not come to begin with 
Andrew Joseph · STAT News · 4d ago
Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation
Garvin Kruthof · ArXiv cs.AI · 4d ago
Looking for papers on general formalizations of "agency"
lovagrus · LessWrong · 5d ago
SFF’s HSEE grant round; human intelligence amplification projects I’d like to see by TsviBT
TsviBT · Nuno Sempere · 8d ago