Inference cost at scale with napkin math

gmays·Hacker News·Community·June 16, 2026

index about blog work now Inference cost at scale with napkin math Jun 14 gpu ai math If you serve AI models as a part of your product stack, you've likely wondered what kind of scale your GPU cluster tops out at. With some rudimentary knowledge about your hardware and model architecture, we can work out the dollar cost-per-user on the back of a napkin1. If you're comfortable reasoning about GPUs and/or LLMs, use this legend to skip to sections of relevance: Resources on a single GPU Cos

Read full article →

Inference cost at scale with napkin math

Related Articles