Interactive tool

LLM Cost & Latency Estimator

Estimate monthly spend, p50/p95 latency, cost-per-user, and the blast radius of one misbehaving feature for GPT, Claude, Gemini, Llama, Mistral and DeepSeek-class models. Pure client-side math, shareable scenarios. All math runs in your browser — nothing is sent anywhere.

Verify pricing before you rely on it. List prices below were collected on —. Vendors change prices often and offer batch/committed-use discounts not modeled here. Confirm on the vendor's official pricing page (linked under the selected model).

Scenario inputs

Model

Avg input tokens / request Avg output tokens / request Requests / day Growth % / month RAG context tokens / request Retry rate % Prompt cache hit-rate % (applies to input + RAG tokens) 0% of input tokens billed at the cached rate

Estimated outputs

Monthly cost (current)

—

Cost / user / mo

—

Monthly cost breakdown

Input —

Output —

Cache saved —

Latency band (per request, output-bound estimate)

— p50

— p95

Rough model: class base latency + output_tokens ÷ throughput. Real latency depends on provider load, region, and streaming.

Anomaly budget

—

Cost if one feature misbehaves for 7 days — a 10× request spike on this model with retries on. This is the number to alert on.

12-month projection (at growth rate)

—

Pinned scenarios (side-by-side, in this URL)

Model	In/Out tok	Req/day	Cache	Monthly $	$/user	p50 / p95	Anomaly $
No pinned scenarios yet. Configure inputs and click Pin this scenario to compare models side-by-side.

Methodology & assumptions

Billed input tokens = (avg input + RAG context) per request. With a cache hit-rate of h, a fraction h of those tokens is billed at the model's cached-input rate (where the vendor publishes one) and the rest at the standard input rate. Retries multiply total request volume by (1 + retry%). Monthly = daily × 30.4.

Cost-per-user is reported per 1,000 requests by default. Latency uses per-class base p50/p95 plus output-token throughput from the model's latency class. These are planning estimates, not SLAs.

Scenario inputs

Estimated outputs

Pinned scenarios (side-by-side, in this URL)

Related tools in this network