Porting MACHIAVELLI To Inspect
TL;DRThe MACHIAVELLI benchmark aims to measure how often AI agents take unethical actions when pursuing a goal.Because this is an alignment benchmark, not a capabilities benchmark, it is more important that it be run on each new generation of AI.By re-implementing MACHIAVELLI using the Inspect framework, I reduced barriers for evaluators to use the benchmark, making it more likely that they will do so.I also learned a few things the way that may be helpful to others who are just getting started ...
Read full article →