Porting MACHIAVELLI To Inspect

·LessWrong··

TL;DRThe MACHIAVELLI benchmark aims to measure how often AI agents take unethical actions when pursuing a goal.Because this is an alignment benchmark, not a capabilities benchmark, it is more important that it be run on each new generation of AI.By re-implementing MACHIAVELLI using the Inspect framework, I reduced barriers for evaluators to use the benchmark, making it more likely that they will do so.I also learned a few things the way that may be helpful to others who are just getting started ...

Read full article →

Related Articles

Epic Games announces Lore version control system
regnerba · Hacker News · 5h ago
Volkswagen started blocking GrapheneOS users
microtonal · Hacker News · 5h ago
RFC 10008: The new HTTP Query Method
schappim · Hacker News · 9h ago
Apple is about to make Hide My Email useless
SXX · Hacker News · 1d ago
US holds off blacklisting DeepSeek, more than 100 firms deemed security risks
giuliomagnifico · Hacker News · 16h ago