What Are Tokens?

Think of tokens as small chunks of text. A token is roughly ¾ of a word — so 1,000 tokens is about 750 words (roughly a page and a half of text).

When you send a message to an AI API, you pay for two things:

  • Input tokens — the text you send (your prompt, context, documents)
  • Output tokens — the text the AI writes back

Output tokens cost more because the AI is generating new text, which takes more compute. Pricing is listed per 1 million tokens (1M).

A tweet~40 tokens
An email~200 tokens
A blog post~1,500 tokens
A novel~100,000 tokens

Pricing

Cost Per 1M Tokens

All prices in USD. Updated March 2026.

OpenAI

ModelInputOutputContextBest For
GPT-4.1$2.00$8.001MCoding, long documents, instruction-following
GPT-4o$2.50$10.00128KGeneral-purpose flagship, multimodal (text + image + audio)
GPT-4.1-mini$0.40$1.601MBudget-friendly coding & analysis with long context
GPT-4o-mini$0.15$0.60128KHigh-volume lightweight tasks, classification
GPT-4.1-nano$0.10$0.401MUltra-cheap extraction, routing, high-volume ops
o3$2.00$8.00200KAdvanced reasoning — math, science, multi-step logic
o4-mini$1.10$4.40200KBudget reasoning for STEM and chain-of-thought tasks

Anthropic (Claude)

ModelInputOutputContextBest For
Claude Opus 4.6$5.00$25.00200KMost capable — complex analysis, agentic coding, deep research
Claude Sonnet 4.6$3.00$15.00200KBest balance of quality and cost — coding, tool use, production workloads
Claude Haiku 4.5$1.00$5.00200KFast & affordable — real-time chat, classification, simple tasks
Claude Haiku 3.5 (legacy)$0.80$4.00200KBudget option — still available, slightly cheaper than Haiku 4.5

Google (Gemini)

ModelInputOutputContextBest For
Gemini 2.5 Pro$1.25$10.001MAdvanced reasoning at a great price — coding, multimodal analysis
Gemini 2.5 Flash$0.30$2.501MFast reasoning on a budget — great value mid-tier
Gemini 2.0 Flash$0.10$0.401MCheapest 1M-context model — high-volume, real-time apps

Recommendations

Which Model Should You Use?

Chatbot / Customer Support

High volume, fast responses, low cost per interaction.

Best: Gemini 2.0 FlashRunner-up: GPT-4o-mini

At $0.10/1M input tokens, Gemini 2.0 Flash handles thousands of conversations for pennies. GPT-4o-mini is comparable. Both are fast and cheap.

Code Generation & Dev Tools

Needs strong reasoning and code quality.

Best: Claude Sonnet 4Runner-up: GPT-4.1

Claude Sonnet consistently ranks highest on coding benchmarks. GPT-4.1 is a close second with a 1M context window — great for large codebases.

Document Processing & Summarization

Large inputs, moderate outputs — context window matters.

Best: Gemini 2.5 FlashRunner-up: GPT-4.1-mini

Both offer 1M context windows at low prices. Gemini 2.5 Flash at $0.30/1M input is hard to beat for ingesting long documents.

Complex Reasoning & Research

Multi-step logic, analysis, math, science.

Best: o3Runner-up: Gemini 2.5 Pro

OpenAI's o3 excels at chain-of-thought reasoning. Gemini 2.5 Pro offers similar capability at a slightly lower price point.

High-Volume Classification & Extraction

Thousands of API calls — cost is everything.

Best: GPT-4.1-nanoRunner-up: Gemini 2.0 Flash

Both cost $0.10/1M input tokens — the cheapest in the industry. GPT-4.1-nano has a 1M context window for batch processing.

Creative Writing & Content

Natural, nuanced prose with good instruction following.

Best: Claude Sonnet 4Runner-up: GPT-4o

Claude is known for natural, well-structured writing. GPT-4o is multimodal and handles mixed-media content well.

Calculator

Estimate Your API Costs

Plug in your expected usage and see what each model will cost you per day and per month.

~75,000 words
~15,000 words
3,000 / month
ModelDaily CostMonthly Cost
OpenAIGPT-4.1-nano$1.80$54.00
GoogleGemini 2.0 Flash$1.80$54.00
OpenAIGPT-4o-mini$2.70$81.00
OpenAIGPT-4.1-mini$7.20$216.00
GoogleGemini 2.5 Flash$8.00$240.00
AnthropicClaude Haiku 3.5$16.00$480.00
OpenAIo4-mini$19.80$594.00
AnthropicClaude Haiku 4.5$20.00$600.00
GoogleGemini 2.5 Pro$32.50$975.00
OpenAIGPT-4.1$36.00$1080.00
OpenAIo3$36.00$1080.00
OpenAIGPT-4o$45.00$1350.00
AnthropicClaude Sonnet 4.6$60.00$1800.00
AnthropicClaude Opus 4.6$100.00$3000.00

Key Takeaways

  • Cheapest overall: Gemini 2.0 Flash and GPT-4.1-nano tie at $0.10 / 1M input tokens. Perfect for high-volume, simple tasks.
  • Best value mid-tier: Gemini 2.5 Flash ($0.30 input / $2.50 output) gives you reasoning capabilities at a great price.
  • Best for coding: Claude Sonnet 4.6 leads on code benchmarks. GPT-4.1's 1M context is ideal for large projects.
  • Most capable: Claude Opus 4.6 at $5 / $25 — the most capable model available, now 3x cheaper than its predecessor.
  • Context window matters: If you need to process long documents, the GPT-4.1 family and Gemini models offer 1M tokens natively.
  • Output costs add up: Output tokens cost 3-5x more than input. If your app generates long responses, that's where most of your spend goes.

Need help choosing the right AI model?

We engineer AI-first applications and can help you pick the right model — and the right architecture — for your use case.

Start a Project