Interactive tool
LLM Cost & Latency Estimator
Estimate monthly spend, p50/p95 latency, cost-per-user, and the blast radius of one misbehaving feature for GPT, Claude, Gemini, Llama, Mistral and DeepSeek-class models. Pure client-side math, shareable scenarios. All math runs in your browser — nothing is sent anywhere.
—. Vendors change prices
often and offer batch/committed-use discounts not modeled here. Confirm
on the vendor's official pricing page (linked under the selected model).
Scenario inputs
Estimated outputs
Rough model: class base latency + output_tokens ÷ throughput. Real latency depends on provider load, region, and streaming.
Cost if one feature misbehaves for 7 days — a 10× request spike on this model with retries on. This is the number to alert on.
Pinned scenarios (side-by-side, in this URL)
| Model | In/Out tok | Req/day | Cache | Monthly $ | $/user | p50 / p95 | Anomaly $ | |
|---|---|---|---|---|---|---|---|---|
| No pinned scenarios yet. Configure inputs and click Pin this scenario to compare models side-by-side. | ||||||||
Methodology & assumptions
Billed input tokens = (avg input + RAG context) per request. With a cache hit-rate of h, a fraction h of those tokens is billed at the model's cached-input rate (where the vendor publishes one) and the rest at the standard input rate. Retries multiply total request volume by (1 + retry%). Monthly = daily × 30.4.
Cost-per-user is reported per 1,000 requests by default. Latency uses per-class base p50/p95 plus output-token throughput from the model's latency class. These are planning estimates, not SLAs.