Inference cost at scale with napkin math

·Hacker News··

index about blog work now Inference cost at scale with napkin math Jun 14 gpu ai math If you serve AI models as a part of your product stack, you've likely wondered what kind of scale your GPU cluster tops out at. With some rudimentary knowledge about your hardware and model architecture, we can work out the dollar cost-per-user on the back of a napkin1. If you're comfortable reasoning about GPUs and/or LLMs, use this legend to skip to sections of relevance: Resources on a single GPU Cos

Read full article →

Related Articles

Bun has an open PR adding shared-memory threads to JavaScriptCore
gr4vityWall · Hacker News · 6h ago
Hyundai buys Boston Dynamics
ck2 · Hacker News · 1d ago
Project Valhalla, Explained: How a Decade of Work Arrives in JDK 28
philonoist · Hacker News · 1d ago
Swiss parliament lifts ban on new nuclear power plants
leonidasrup · Hacker News · 2d ago
DOS Game "F-15 Strike Eagle II" reversing project needs DOS test pilots
LowLevelMahn · Hacker News · 8h ago