Inference cost at scale with napkin math
index about blog work now Inference cost at scale with napkin math Jun 14 gpu ai math If you serve AI models as a part of your product stack, you've likely wondered what kind of scale your GPU cluster tops out at. With some rudimentary knowledge about your hardware and model architecture, we can work out the dollar cost-per-user on the back of a napkin1. If you're comfortable reasoning about GPUs and/or LLMs, use this legend to skip to sections of relevance: Resources on a single GPU Cos
Read full article →