GLM 5.2 playing text adventures

·LessWrong··

I’ve heard some buzz around the new GLM 5.2 open-weights model. They say it’s very capable! I won’t run a full comparison benchmark, but I have some credits sloshing around on OpenRouter so I figured I might compare GLM 5.2 to the similarly-priced Gemini 3 Flash[1], and see where things land.This uses the same setup as the previous benchmark: each LLM gets a few attempts at playing the game, with each attempt being limited to a fixed budget of around $0.15. The LLM doesn’t know it, but the harne...

Read full article →

Related Articles

Lore – Open source version control system designed for scalability
regnerba · Hacker News · 19h ago
Volkswagen started blocking GrapheneOS users
microtonal · Hacker News · 19h ago
Leaked financial docs show OpenAI is losing billions of dollars a year
greenchair · Hacker News · 12h ago
US holds off blacklisting DeepSeek, more than 100 firms deemed security risks
giuliomagnifico · Hacker News · 1d ago
Tesco moving 40k server workloads off VMware amid Broadcom's abusive conduct
Bender · Hacker News · 13h ago