AI is on every roadmap now. The part almost nobody models is the bill. Leadership greenlights “an AI feature,” a prototype works beautifully on a dozen test inputs, and then it meets real traffic and the invoice arrives — sometimes ten times what anyone guessed. The whole thing runs on a unit most teams have never had to reason about: the token.
This is a practical guide for the people who have to make the call — founders, CTOs, and the engineers who’ll actually ship it. We’ll cover what a token is, how OpenAI and Anthropic stack up model for model, what each tier really costs as of June 2026, and the handful of decisions that separate an AI feature that’s smart from one that’s smart and sustainable.
Tokens: the meter everything runs on
An AI model doesn’t read words the way you do. It breaks text into tokens — chunks of roughly four characters. Sometimes a whole short word (“cat”), sometimes a fragment (“login” often becomes “log” + “in”). The rules of thumb worth memorizing:
- 1 token ≈ 4 characters ≈ ¾ of a word
- 1,000 tokens ≈ ~750 words ≈ a page and a half
- 1,000,000 tokens ≈ a full-length novel
Every major provider bills by the token, and they bill in two directions:
- Input tokens — everything you send in: the system prompt, instructions, retrieved documents, conversation history, the user’s question.
- Output tokens — everything the model writes back.
The detail that wrecks budgets: output costs roughly 5× what input does. The model generating text is far pricier than the model reading it. A verbose assistant that pads every answer is quietly the most expensive thing in your stack. This is also why “just stuff the whole knowledge base into every prompt” is a tax you pay on every single call — which is exactly what caching (below) is for.
The two vendors, and the tiers that matter
There are many model providers, but two anchor most serious deployments:
- OpenAI — the GPT line (currently GPT-5.5, GPT-5.4, and the smaller mini and nano variants).
- Anthropic — the Claude line, in three named sizes: Opus (largest, smartest), Sonnet (the balanced production workhorse), and Haiku (small, fast, cheap).
Both sell the same shape of lineup: a frontier model for hard reasoning, a workhorse for the bulk of production traffic, and a budget model for high-volume simple jobs. The strategic decision is almost never “which company” — it’s “which tier, for which task.”
| Tier | OpenAI | Anthropic | Use it for |
|---|---|---|---|
| Flagship | GPT-5.5 | Claude Opus 4.8 | Hard reasoning, long agentic coding, high-autonomy work you can hand off and trust. |
| Workhorse | GPT-5.4 | Claude Sonnet 4.6 | The bulk of production: chat, summarization, drafting, RAG, most app features. Best value. |
| Budget | GPT-5.4 mini / nano | Claude Haiku 4.5 | Classification, extraction, tagging, routing, quick replies — high volume, low difficulty. |
What each tier actually costs

Prices are quoted per million tokens. Here’s the full board, current as of June 2026:
| Model | Maker | Input ($/M) | Output ($/M) |
|---|---|---|---|
| GPT-5.5 | OpenAI | $5.00 | $30.00 |
| Claude Opus 4.8 | Anthropic | $5.00 | $25.00 |
| GPT-5.4 | OpenAI | $2.50 | $15.00 |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 |
| GPT-5.4 mini | OpenAI | $0.75 | $4.50 |
| GPT-5.4 nano | OpenAI | $0.20 | $1.25 |
Two takeaways for anyone signing the cheque. First, the two vendors are priced within pennies of each other tier for tier — this is a genuinely competitive market. Second, the spread between flagship and budget is enormous: Claude Opus 4.8 costs 5× what Haiku 4.5 does for the same volume. Picking the wrong tier is the single biggest source of waste in production AI.
A worked example: pricing an AI feature at scale
Per-million numbers feel abstract until you attach them to traffic. Take a common build: an AI support assistant handling 50,000 conversations a month. Assume each reply sends ~3,000 input tokens (system prompt + a few prior messages + the question) and writes ~500 tokens back.
| Model choice | Cost / reply | 50,000 / month | Per year |
|---|---|---|---|
| GPT-5.4 mini (budget) | ~$0.0045 | ~$225 | ~$2,700 |
| Claude Haiku 4.5 (budget) | ~$0.0055 | ~$275 | ~$3,300 |
| Sonnet 4.6 / GPT-5.4 (workhorse) | ~$0.016 | ~$800 | ~$9,600 |
| Claude Opus 4.8 (flagship) | ~$0.0275 | ~$1,375 | ~$16,500 |
| GPT-5.5 (flagship) | ~$0.030 | ~$1,500 | ~$18,000 |
Same feature, same traffic — and a 6× swing, from ~$2,700 to ~$18,000 a year, purely on model choice. For most support traffic the budget tier is more than capable, with a workhorse fallback for the genuinely tricky tickets. Defaulting to the flagship “to be safe” is how teams burn five figures answering questions a far cheaper model would have nailed.
Three levers that cut the bill

Model choice is the biggest lever, but it isn’t the only one. Two more move the invoice hard, and both are sitting in the API waiting to be switched on:
- Prompt caching (~90% off the repeated part). Most production prompts carry a large, fixed payload — a long system prompt, policy docs, a schema — that’s identical on every call. Both vendors let you cache it so a hit costs about a tenth of normal input. For any app with a big static instruction block, this is the highest-leverage change you can make, often a 50–80% cut on input spend.
- Batch processing (50% off, both ways). Anything not time-sensitive — overnight report generation, bulk classification, clearing a backlog — runs asynchronously at half price on input and output. If a job can tolerate a few minutes of latency, batching it is free money.
- Right-sizing & routing. Don’t pick one model for everything. Route the easy 80% to the budget tier and escalate only the hard cases to a workhorse or flagship. A small classifier deciding “which model handles this?” frequently pays for itself many times over.
One trap that isn’t on the price list: tokenizers differ
The sticker price per token is not the whole story, because the same text becomes a different number of tokens on different models. Anthropic’s newest models (Opus 4.7 and up) ship an updated tokenizer that can produce up to 35% more tokens for the same input. The per-token rate didn’t change — but the effective cost per request can. The lesson for engineers: never trust a back-of-envelope estimate built from list prices alone. Measure token counts on your own real prompts before you commit a budget.
So: OpenAI or Anthropic?
After all the comparison, the honest guidance is refreshingly boring:
- Default to the workhorse tier — Sonnet 4.6 or GPT-5.4 — for most production work. It carries the load at a fair price.
- Push easy, high-volume work to the budget tier — Haiku 4.5, GPT-5.4 mini/nano. This is where the savings live.
- Reserve the flagships — Opus 4.8, GPT-5.5 — for problems where a better answer is genuinely worth the premium: complex agentic coding, deep reasoning, high-autonomy tasks.
- On vendor choice, cost is a wash. They’re neck-and-neck on price. Decide on fit instead — instruction-following, tone, latency, tool ecosystem, and how each performs on your evals. Plenty of mature teams run both and route each job to whichever model wins it.
The headline isn’t “Vendor A beats Vendor B.” It’s that treating tokens as an engineering variable — not a mystery line item — is what turns AI from an unpredictable cost into one you control. On the same workload, the careful version routinely costs a fraction of the careless one.
The bottom line
Token pricing is only one part of total cost of ownership — engineering time, evaluation, latency budgets, and avoiding lock-in all matter too. But it’s the part that scales linearly with every user you add, which makes it the part worth engineering deliberately from day one rather than discovering on a surprise invoice.
That’s the work we do at Champlin Enterprises: choosing the right models, wiring up caching, batching and routing, and building AI features that hold up at scale — AI-first, not AI-added. If you’re putting AI into production and want it engineered to be smart and economical, let’s talk.
Pricing current as of June 2026, drawn from the official Anthropic and OpenAI pricing pages. AI pricing moves quickly — always confirm at the source before budgeting.





