Aligned Agents Still Build Misaligned Organisations
By now, we have plenty of examples of AI agent misalignment. They lie, they sometimes cheat, they break rules, they demonstrate odd preferences for self-preservation or against self-preservation. They reward hack! Quite a bit has been studied about them and much of these faults have been ameliorated enough that we use them all the time.But we’re starting to go beyond a single agent. We’re setting up multi-agent workflows. Agents are working with other agents, autonomously or semi-autonomously, t...
Read full article →