Resource
AI API Pricing
Compared
Side-by-side costs for every major LLM — OpenAI, Anthropic, and Google — updated for 2026. Plus an interactive calculator so you know exactly what you'll spend.
What Are Tokens?
Think of tokens as small chunks of text. A token is roughly ¾ of a word — so 1,000 tokens is about 750 words (roughly a page and a half of text).
When you send a message to an AI API, you pay for two things:
- Input tokens — the text you send (your prompt, context, documents)
- Output tokens — the text the AI writes back
Output tokens cost more because the AI is generating new text, which takes more compute. Pricing is listed per 1 million tokens (1M).
Pricing
Cost Per 1M Tokens
All prices in USD. Updated March 2026.
OpenAI
| Model | Input | Output | Context | Best For |
|---|---|---|---|---|
| GPT-4.1 | $2.00 | $8.00 | 1M | Coding, long documents, instruction-following |
| GPT-4o | $2.50 | $10.00 | 128K | General-purpose flagship, multimodal (text + image + audio) |
| GPT-4.1-mini | $0.40 | $1.60 | 1M | Budget-friendly coding & analysis with long context |
| GPT-4o-mini | $0.15 | $0.60 | 128K | High-volume lightweight tasks, classification |
| GPT-4.1-nano | $0.10 | $0.40 | 1M | Ultra-cheap extraction, routing, high-volume ops |
| o3 | $2.00 | $8.00 | 200K | Advanced reasoning — math, science, multi-step logic |
| o4-mini | $1.10 | $4.40 | 200K | Budget reasoning for STEM and chain-of-thought tasks |
Anthropic (Claude)
| Model | Input | Output | Context | Best For |
|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | 200K | Most capable — complex analysis, agentic coding, deep research |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K | Best balance of quality and cost — coding, tool use, production workloads |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K | Fast & affordable — real-time chat, classification, simple tasks |
| Claude Haiku 3.5 (legacy) | $0.80 | $4.00 | 200K | Budget option — still available, slightly cheaper than Haiku 4.5 |
Google (Gemini)
| Model | Input | Output | Context | Best For |
|---|---|---|---|---|
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | Advanced reasoning at a great price — coding, multimodal analysis |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | Fast reasoning on a budget — great value mid-tier |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | Cheapest 1M-context model — high-volume, real-time apps |
Recommendations
Which Model Should You Use?
Chatbot / Customer Support
High volume, fast responses, low cost per interaction.
At $0.10/1M input tokens, Gemini 2.0 Flash handles thousands of conversations for pennies. GPT-4o-mini is comparable. Both are fast and cheap.
Code Generation & Dev Tools
Needs strong reasoning and code quality.
Claude Sonnet consistently ranks highest on coding benchmarks. GPT-4.1 is a close second with a 1M context window — great for large codebases.
Document Processing & Summarization
Large inputs, moderate outputs — context window matters.
Both offer 1M context windows at low prices. Gemini 2.5 Flash at $0.30/1M input is hard to beat for ingesting long documents.
Complex Reasoning & Research
Multi-step logic, analysis, math, science.
OpenAI's o3 excels at chain-of-thought reasoning. Gemini 2.5 Pro offers similar capability at a slightly lower price point.
High-Volume Classification & Extraction
Thousands of API calls — cost is everything.
Both cost $0.10/1M input tokens — the cheapest in the industry. GPT-4.1-nano has a 1M context window for batch processing.
Creative Writing & Content
Natural, nuanced prose with good instruction following.
Claude is known for natural, well-structured writing. GPT-4o is multimodal and handles mixed-media content well.
Calculator
Estimate Your API Costs
Plug in your expected usage and see what each model will cost you per day and per month.
Key Takeaways
- Cheapest overall: Gemini 2.0 Flash and GPT-4.1-nano tie at $0.10 / 1M input tokens. Perfect for high-volume, simple tasks.
- Best value mid-tier: Gemini 2.5 Flash ($0.30 input / $2.50 output) gives you reasoning capabilities at a great price.
- Best for coding: Claude Sonnet 4.6 leads on code benchmarks. GPT-4.1's 1M context is ideal for large projects.
- Most capable: Claude Opus 4.6 at $5 / $25 — the most capable model available, now 3x cheaper than its predecessor.
- Context window matters: If you need to process long documents, the GPT-4.1 family and Gemini models offer 1M tokens natively.
- Output costs add up: Output tokens cost 3-5x more than input. If your app generates long responses, that's where most of your spend goes.
Need help choosing the right AI model?
We engineer AI-first applications and can help you pick the right model — and the right architecture — for your use case.
Start a Project