
LLM Pricing Guide — Compare 500+ AI Models
Compare LLM and media API pricing across 500+ models from OpenAI, Anthropic, Google, Meta, xAI, Mistral, and more — by token, image, and video cost.
AI spend can get expensive fast: U.S. companies now average $85,500 per month on AI. My main takeaway is simple: the cheapest model on paper is not always the lowest-cost option once you factor in output length, context size, retries, tool fees, and plan limits.
If I were choosing from this guide, I’d look at four things first:
- Unit price: tokens, images, or video seconds
- Limits: RPM, TPM, context windows, and plan caps
- Modality: text, image, audio, video, and vision support
- Cost per finished output: not just the sticker price
The article compares APIMart, OpenAI, Anthropic, Google AI, Meta, xAI, Mistral, Cohere, Runway, Stability AI, Black Forest Labs, and Kling AI.
A few clear patterns stand out:
- Text pricing varies a lot. OpenAI ranges from $0.05 to $30.00 per 1M input tokens.
- Output is often much more expensive than input. In some cases, output runs 4x to 6x higher.
- Long context can change the bill. OpenAI adds higher pricing after 270,000 tokens, and Google changes rates above 200,000 tokens.
- Batch jobs can cut costs. OpenAI, Anthropic, and Mistral list 50% discounts for async processing.
- Video costs stack up with retries. A cheap per-second rate can still turn into a much higher cost per finished clip.
- Unified billing matters for teams using many formats. APIMart’s pitch is one API and one invoice across 500+ models.
If you want the short version:
use budget models for high-volume text, mid-tier models for production apps, premium models only when output quality changes business results, and draft low-res video before paying for final renders.
What are LLM Tokens and API Prices? (Beginner Friendly)
Quick Comparison

| Provider | Main Strength | Watch Out For | Best Fit |
|---|---|---|---|
| APIMart | One API for 500+ models across text, image, video, and audio | Usage costs still scale with volume | Teams that want one billing setup |
| OpenAI | Broad model range and clear token pricing and cost tips | Top models get expensive fast | General text, image, audio, video |
| Anthropic | Strong for coding and long context | Output rates are high | Agents, coding, long prompts |
| Google AI | Low-cost Flash options and large context | Higher rates above 200K tokens | High-volume text and multimodal apps |
| Meta | Very low hosted or self-hosted Llama pricing | Pricing and limits depend on host | Cost-focused teams with hosting options |
| xAI | Lower spread between input and output pricing | Tool calls add extra fees | Long-response and tool-use apps |
| Mistral | Low token prices and batch discounts | Some tools cost extra | Utility text, coding, EU-based use |
| Cohere | Good fit for RAG, embeddings, and rerank | Less suited for media generation | Search, retrieval, knowledge bases |
| Runway | Video-first platform with clear credit math | Retries can drive up finished cost | Video creation and editing |
| Stability AI | Low image pricing and editing tools | Narrower scope than text vendors | High-volume image and audio work |
| Black Forest Labs | Fine-grained image pricing by size | Costs rise with retries and references | Image generation and editing |
| Kling AI | Lower-cost short video generation | Clip length and concurrency limits | Short-form video |
So before I compare prices line by line, I’d start with one question: What am I paying for most - tokens, images, seconds, or retries?
1. APIMart

APIMart uses pay-as-you-go billing with no monthly minimums and no hidden fees. Pricing changes based on modality, so text, image, video, and audio aren’t billed the same way.
Unit Pricing
Pricing varies by modality, as the table below shows.
| Modality | Billing Unit | Example Model | APIMart Price |
|---|---|---|---|
| Text | Per 1M tokens | Qwen2.5-VL-72B | $20.00 |
| Image | Per call | GPT Image 2 | $0.006 |
| Image | Per call | Wan 2.7 Image | $0.0216 |
| Video | Per second | Sora 2 | $0.08 |
| Video | Per second | Kling V3 (720p) | $0.0672 |
Image generation costs can shift a lot depending on the quality tier. For example, a 1024×1024 GPT-Image-2-Official call costs about $0.00488 at Low quality, $0.04232 at Medium, and $0.16872 at High. That gap adds up fast. If top-end output isn’t required, using a lower tier can cut per-call spend.
Included Limits
Default accounts come with RPM and TPM limits. Enterprise accounts can request higher-throughput channels.
Model Coverage
APIMart supports text, image, video, and audio models through one API. That includes models such as GPT-5, Claude, Sora 2, Midjourney, and Kling V3.
Cost-to-Output
The main upside here is consolidated billing across modalities. Instead of juggling separate bills for text, image, and video, you get one setup that makes spend control easier.
Next, the guide compares how major providers structure pricing across text, image, and video models.
2. OpenAI

OpenAI uses a pay-per-token pricing model for text. And the gap between models is huge.
As of June 2026, pricing starts at $0.05 per 1M input tokens for GPT-5 nano and goes all the way up to $30.00 per 1M input tokens for GPT-5.5 Pro [3][5]. The easiest way to read OpenAI pricing is by model tier, because input, output, and cached-token rates can be very different from one model to the next.
Unit Pricing
| Model | Input (per 1M) | Cached Input | Output (per 1M) |
|---|---|---|---|
| GPT-5.5 Pro | $30.00 | - | $180.00 |
| GPT-5.5 (Standard) | $5.00 | $0.50 | $30.00 |
| GPT-5.4 | $2.50 | $0.25 | $15.00 |
| GPT-5.4 mini | $0.75 | $0.075 | $4.50 |
| GPT-5.4 nano | $0.20 | $0.02 | $1.25 |
| GPT-5 nano | $0.05 | $0.005 | $0.40 |
Across OpenAI models, output tokens cost 4 to 6 times more than input tokens [6]. That matters a lot when your app produces long answers, summaries, or agent-style responses. OpenAI also offers Batch and Flex tiers with a flat 50% discount across all models, so GPT-5.5 input falls from $5.00 to $2.50 per 1M tokens [5].
Costs can climb again when long-context usage hits surcharge pricing.
Included Limits
OpenAI doubles both input and output rates once total context goes past 270,000 tokens [3][5]. If you're working with long document review or multi-turn agent loops, rolling summarization is one of the simplest ways to stay under that line.
OpenAI uses this same pricing setup across image, audio, and video models too.
Model Coverage
OpenAI prices image, audio, and video generation separately. Sora-2 costs $0.10 per second for 720p video, while Sora-2-pro at 1080p costs $0.70 per second at the standard rate [5].
For other media:
- Image pricing ranges from $2.50 to $8.00 per 1M tokens
- Whisper transcription costs $0.006 per minute
- TTS (tts-1) costs $0.015 per 1,000 characters [3][4]
Cost-to-Output
One of the fastest ways to cut spend is simple: send lower-value work to cheaper models. Processing 10,000 support tickets costs about $16 with GPT-4.1, $3.20 with GPT-4.1 mini, and $0.80 with GPT-4.1 nano [4].
3. Anthropic

Anthropic splits its Claude API lineup into four pricing tiers: Frontier/Research (Claude Fable 5 / Mythos 5), Flagship (Claude Opus 4.5–4.8), Mid-tier (Claude Sonnet 4.5–4.6), and Budget (Claude Haiku 4.5) [9][10]. The pattern is pretty clear. As you move up the tiers, you get more reasoning depth and better output, but the bill climbs fast too. For most buyers, the choice comes down to this: Haiku is the low-cost option, Sonnet is the middle ground, and Opus/Fable are built for heavier jobs.
Unit Pricing
Anthropic prices output tokens at 5x the input rate across the current tiered lineup [7][6]. Prices below are in USD per 1 million tokens [13][15].
| Model | Input (per 1M) | Cache Read (per 1M) | Output (per 1M) |
|---|---|---|---|
| Claude Fable 5 / Mythos 5 | $10.00 | $1.00 | $50.00 |
| Claude Opus 4.8 | $5.00 | $0.50 | $25.00 |
| Claude Sonnet 4.6 | $3.00 | $0.30 | $15.00 |
| Claude Haiku 4.5 | $1.00 | $0.10 | $5.00 |
Prompt caching can trim costs when you're reusing the same prefix again and again. Cache writes cost 1.25x the base input rate for a 5-minute TTL, or 2x for a 1-hour TTL. Cache reads cost 10% of standard input. In practice, caching starts to make sense after four or more uses of the same prefix [3][9][12].
Included Limits
Most current flagship and mid-tier Anthropic models - including Fable 5, Mythos 5, Opus 4.6–4.8, and Sonnet 4.6 - come with a 1M-token context window at standard pricing [9][11]. Claude Haiku 4.5 goes up to 200,000 tokens. Rate limits follow a tiered setup, from Tier 1 through Enterprise, with RPM and TPM caps set by plan [13][15].
Model Coverage
Anthropic models handle text and vision inputs, and Computer Use adds extra token overhead. Some add-ons are billed separately:
- Web search costs $10 per 1,000 searches
- Managed Agents cost $0.08 per active session-hour, plus token charges
The Batch API cuts token costs by 50% for async jobs with a 24-hour turnaround [8][9][11].
Cost-to-Output
This is where pricing gets practical: which tier stays cost-effective for repeat tasks, long-context work, and agent flows?
A one-hour coding session on Claude Opus 4.8, using 50,000 input tokens with 40,000 as cache reads and 15,000 output tokens, costs about $0.525, including the $0.08 agent session fee [9][12]. That gives you a decent picture of how Anthropic pricing behaves in actual use, not just on a pricing table.
For production jobs like coding assistants and multi-step agents, Claude Sonnet 4.6 tends to offer the best balance between cost and capability [6][3].
Next, compare Google AI's pricing across Gemini text and multimodal models.
4. Google AI

Google prices its models based on the model you pick and the size of the context window. One pricing rule matters right away: keep prompts under 200,000 tokens if you want to avoid the higher rate [14][3].
Unit Pricing
| Model | Input (per 1M) | Output (per 1M) | Context Window |
|---|---|---|---|
| Gemini 3.1 Pro (≤200K) | $2.00 | $12.00 | 1M–2M |
| Gemini 3.1 Pro (>200K) | $4.00 | $18.00 | 1M–2M |
| Gemini 2.5 Pro (≤200K) | $1.25 | $10.00 | 2M |
| Gemini 3.5 Flash | $1.50 | $9.00 | 1M |
| Gemini 3 Flash | $0.50 | $3.00 | 1M |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M |
| Gemini 3 4B | $0.04 | $0.08 | 131K |
For image generation, Imagen 4 Fast starts at $0.01–$0.02 per image, while Imagen 4 Ultra costs $0.06 per image. Veo 3.1 video is billed by the second. Standard runs at $0.40 per second for 720p and 1080p, and Light runs at $0.05–$0.08 per second [17].
Included Limits
Model price is only part of the story. Throughput limits and data settings can shift your total spend in a big way.
Google’s main tradeoff is pretty clear: low model pricing on one side, and throughput limits plus data restrictions on the other. In Google AI Studio, the free tier gives you about 15 RPM, free tokens, and product-improvement data use. The paid tier jumps to about 1,000–2,000 RPM, turns off product-improvement data use, and adds context caching plus Batch API. Enterprise plans add provisioned throughput, volume discounts, and compliance features [17][18].
Context caching is one of the biggest cost levers here. Writing to the cache is free, and reading from it costs 25% of the standard input rate [18].
Model Coverage
Google’s lineup spans text, multimodal, image, and video models. It also supports image input, starting at $0.0025 per image [3].
Cost-to-Output
For chatbots, summarization, and classification, Gemini 2.5 Flash or Gemini 3.1 Flash-Lite will usually make the most sense. They’re cheaper and fit many day-to-day workloads well. Save Gemini 3.1 Pro for cases where you need the larger context window, and use rolling summarization to stay under the 200,000-token cutoff [3][6].
There’s also a simple price angle worth noting. At $2.00 per 1M input tokens, Gemini 3.1 Pro is cheaper than flagship models such as GPT-5.5 ($5.00) and Claude Opus 4.8 ($5.00) for standard-length prompts [2].
Next, compare Meta's pricing and model coverage.
5. Meta
Meta works a bit differently from the closed-model providers above. Its Llama models are open weight, so your cost depends on where you host or access them. In practice, that means the exact same model can have very different pricing from one provider to another. For example, Llama 3.3 70B has been listed as low as $0.10 per 1M input tokens. Meta also doesn't publish a first-party API pricing sheet, which is why pricing can swing so much across hosts [1][19][20].
Unit Pricing
Current pricing is centered on Llama 4 Scout and Llama 4 Maverick [21][22].
| Model | Input (per 1M) | Output (per 1M) | Context Window |
|---|---|---|---|
| Llama 4 Scout | $0.08 – $0.17 | $0.15 – $0.66 | Up to 10,000,000 tokens |
| Llama 4 Maverick | $0.15 – $0.24 | $0.60 – $0.97 | 1,000,000 tokens |
| Llama 3.3 70B | $0.10 – $0.72 | $0.32 – $0.72 | 128K – 131K tokens |
| Llama 3.2 1B Instruct | $0.01 – $0.02 | $0.01 – $0.02 | 60K – 131K tokens |
| Llama 3.1 405B Instruct | $0.90 – $3.00 | $0.90 – $3.00 | 128K – 131K tokens |
That low sticker price looks great on paper. But it only helps if the host's context limits and throughput caps match your workload.
Included Limits
There isn't one standard Meta plan with shared rate limits or a built-in free tier. Hosts set their own throughput limits, context caps, and caching rules. So if you're planning long-context work with Llama 4 Scout, check the host's context ceiling first instead of assuming you'll get the full advertised range.
Model Coverage
Llama 4 Scout and Llama 4 Maverick both support text and vision input, plus tool calling and JSON mode across major providers [21][22]. Older options still have their place too. Llama 3.2 11B Vision can still handle vision-heavy jobs, while Llama 3.2 1B Instruct is aimed at edge deployments where low latency and lean compute use matter most [21][22].
Cost-to-Output
If you're running high-volume jobs with long prompts, Scout stands out. A coding task with 40K input and 8K output tokens costs about $0.005 per task, which works out to roughly 200 tasks per $1. That same task on GPT-5.5 costs about $0.44, or only 2.3 tasks per $1 [6].
For customer-facing use or multimodal work, Llama 4 Maverick is usually the better match. It benchmarks above GPT-4o while costing far less on input pricing: $0.15/M input versus $2.50/M input [6]. Its smaller gap between input and output pricing also makes it a good fit when you expect longer responses.
Next, compare xAI's pricing and model coverage.
6. xAI

xAI keeps input and output pricing low, which helps when responses get long. Grok 4.3 charges $1.25 per 1 million input tokens and $2.50 per 1 million output tokens. That 2x spread matters when a model writes a lot back to you [6][24].
Unit Pricing
| Model | Input (per 1M) | Cached Input | Output (per 1M) | Context Window |
|---|---|---|---|---|
| Grok 4.3 (Flagship) | $1.25 | $0.20 | $2.50 | 1M tokens |
| Grok 4.20 (Reasoning) | $1.25 | $0.20 | $2.50 | 2M tokens |
| Grok Build 0.1 (Coding) | $1.00 | $0.20 | $2.00 | 256K tokens |
| Grok 4.1 Fast (Budget) | $0.20 | $0.05 | $0.50 | 2M tokens |
For image work, Grok Imagine 1.5 Edit costs $0.01875 per call [23]. Video generation through the Imagine API runs from $0.08 to $0.25 per second, based on resolution [24].
Included Limits
xAI uses pay-as-you-go billing with usage-based rate limits. Enterprise plans can add custom rate limits and dedicated infrastructure [25][26].
There’s one thing to watch: tool use can add up fast. Search and code execution are billed separately, so a low token price doesn’t always mean a low final bill. Web Search, X Search, and Code Execution each cost $5.00 per 1,000 calls [24].
If your jobs aren’t urgent, the Batch API can trim costs by 20% to 50% for tasks processed within 24 hours [24].
Model Coverage
xAI covers text, image, and video use cases [24][16]. Grok 4.20 is built for faster tool use, while Grok Build 0.1 is aimed at coding-heavy work [6][2].
Cost-to-Output
For a standard coding task with 40K input tokens and 8K output tokens, Grok 4.3 costs about $0.07 per task. That works out to roughly 14 tasks per $1 [6].
Next, the guide compares another provider's pricing structure, or you can use a unified LLM API to access these models through a single integration.
7. Mistral AI

Mistral Large 3 costs $0.50 per 1M input tokens and $1.50 per 1M output tokens. That puts it in the low-priced flagship camp. The headline rates look strong, but the final bill can shift once you factor in tool charges and Mistral's billing tiers.
Here’s the current lineup.
Unit Pricing
| Model | Input (per 1M) | Cached Input | Output (per 1M) |
|---|---|---|---|
| Mistral Large 3 (Flagship) | $0.50 | $0.05 | $1.50 |
| Mistral Medium 3.5 (Balanced) | $1.50 | - | $7.50 |
| Mistral Small 4 (Efficient) | $0.10 | $0.01 | $0.30 |
| Magistral Medium (Reasoning) | $2.00 | - | $5.00 |
| Codestral (Coding) | $0.30 | $0.03 | $0.90 |
| Devstral 2 (Coding) | $0.40 | $0.04 | $2.00 |
| Pixtral Large (Multimodal) | $2.00 | - | $6.00 |
Mistral also charges separately for OCR at $4.00 per 1,000 pages, embeddings at $0.10 per 1M tokens, and web search plus code execution at $30 per 1,000 calls [27].
Included Limits
Mistral uses pay-as-you-go billing, with four rate-limit tiers that open up at $20, $100, and $500 in cumulative spend [31]. The Team plan comes with a minimum commitment of $50 per month [27].
There are two levers here that can cut costs fast:
- The Batch API lowers all model prices by 50% for async jobs [27][31].
- Prompt caching can reduce cached token costs by up to 90% when the shared prefix is at least 64 tokens long [31].
These pricing rules matter most when you're running high-volume or async workloads.
Model Coverage
Mistral covers text, reasoning, coding, multimodal, and edge use cases. Codestral supports fill-in-the-middle (FIM), which makes it a good match for IDE workflows. The Ministral series - 3B, 8B, and 14B - is aimed at low-cost or on-device deployments [30][32].
Mistral also offers EU-hosted endpoints and GDPR-friendly data processing at no extra charge [29][31].
Cost-to-Output
For high-volume utility work like entity extraction, classification, and summarization, Mistral Small 4 is the strongest fit [28][31]. If you need more reasoning power but still want low token pricing, Mistral Large 3 makes more sense [28][31].
For reasoning-heavy tasks, Magistral Medium costs $5.00 per 1M output tokens and is 37% cheaper on output than OpenAI's o3 [29].
Next, compare Cohere's pricing for enterprise text and retrieval workloads.
8. Cohere

Cohere is built mainly for retrieval and enterprise search. Its pricing reflects that: lower-cost options for text-heavy retrieval work, and higher-priced models for multimodal or more demanding jobs.
Unit Pricing
Cohere splits its lineup into three buckets: low-cost retrieval models, enterprise multimodal models, and separate tools for embeddings and reranking.
| Model | Input (per 1M) | Output (per 1M) | Context Window |
|---|---|---|---|
| Command R7B | $0.0375 | $0.15 | 128K |
| Command R | $0.15 | $0.60 | 128K |
| Command R+ | $2.50 | $10.00 | 128K |
| Command A (Multimodal) | $2.50 | $10.00 | 256K |
| Aya Expanse (8B/32B) | $0.50 | $1.50 | 128K |
| Embed v3 | $0.10 | - | - |
| Rerank v3 | $2.00 | - | - |
Rerank is billed using search units: one query plus up to 100 documents. If chunks go past 500 tokens, they count separately [33][35].
Included Limits
Cohere gives you free trial keys for testing, capped at 1,000 calls per month and 20 RPM [37]. Production keys use pay-as-you-go pricing, with up to 500 RPM for standard models. Billing happens at month-end or once your balance hits $250 [33][37].
Some of Cohere's top-end models need sales approval before full production use. That includes Command A+, A Reasoning, and A Vision. Until that approval happens, self-serve access stays at trial-style limits [37].
If your team needs dedicated throughput, Cohere also offers Model Vault. Pricing starts at $2,500 per month for an Embed 4 Small instance [33].
Model Coverage
Cohere fits text-first enterprise workflows, not media generation.
The company centers its lineup around text, retrieval, and enterprise search instead of image or video generation. The main exception is Command A, which supports image inputs and multimodal tasks. It also comes with a 256,000-token context window, the largest in Cohere's lineup [34][36].
Aya Expanse supports 49 languages, which makes it a solid pick for global deployments [37].
Cost-to-Output
If you're building RAG pipelines, Cohere's low input pricing is the big draw. These workflows usually burn through far more input tokens than output tokens, so Command R at $0.15 per 1M input tokens helps keep document-heavy prompts from getting too expensive.
A simple example makes the gap clear: running 100,000 support chatbot interactions on Command R costs about $123 per month, while the same volume on Command R+ comes out to roughly $2,050 per month [39].
For pure classification and summarization at scale, Command R7B is the lowest-cost option in the lineup [34][38].
A practical way to think about it:
- Use Command R7B for high-volume classification and summarization.
- Use Command R for RAG and chatbots.
- Use Command R+ only when you need the extra model strength.
Next, compare Runway's pricing or explore cinematic AI video generation alternatives.
9. Runway

Runway is built around video, so its pricing is tied to seconds generated or edited. It uses a credit system for video and image work. You get credits through a subscription or by buying top-up packs at $0.01 per credit, with a $10 minimum. API credits are billed on their own. The main thing to watch is how the credit burn changes from one model to another.
Unit Pricing
| Model | API Rate (Credits/sec) | USD Cost/sec |
|---|---|---|
| Gen-4.5 (Flagship) | 12 /sec | $0.12 |
| Gen-4 Video | 12 /sec | $0.12 |
| Gen-4 Turbo | 5 /sec | $0.05 |
| Aleph 2.0 (Video Editing) | 28 /sec | $0.28 |
| Act-Two (Animation) | 5 /sec | $0.05 |
| Gen-4 Image (1080p) | 8 /img | $0.08 |
Included Limits
On annual plans [40][43], Runway includes the following monthly credit caps:
| Plan | Monthly Cost (Annual) | Credits/Month | Rollover |
|---|---|---|---|
| Free | $0 | 125 (one-time) | None |
| Standard | $12 | 625 | None |
| Pro | $28 | 2,250 | None |
| Max | $76 | 9,500 | 1 month |
The Free plan includes watermarks and does not allow commercial use. Paid plans remove both limits [40][44]. Standard and Pro credits do not roll over, and unused credits expire within 24 hours of the next billing date [40][43][46]. Only Max gives you one month of rollover [40][43][46].
Model Coverage
Runway covers text-to-video, image-to-video, video editing, text-to-image, image-to-image, plus audio and post-processing tools [42]. That range gives it more reach than a tool that only does generation. But price alone doesn't tell the whole story. Output quality changes what you end up paying in practice.
Cost-to-Output
This is where things get more expensive than they first look. Retries add up fast. Most finished clips take 3 to 5 generations, which pushes Gen-4.5 to about $0.50 to $0.80 per finished second [43][44][47].
A common way to keep spend in check is to use Gen-4 Turbo at $0.05/sec for rough drafts and concept tests, then move to Gen-4.5 at $0.12/sec for final renders [41][45]. That setup makes sense if you don't want to burn premium credits while you're still figuring out motion, framing, or timing.
There's also a hard ceiling on lower-tier plans. Standard's 625 credits cover only about 52 seconds of Gen-4.5 video per month [40][44]. That's enough for a handful of polished clips, but it won't carry a steady production workflow.
Alternatively, you can explore MiniMax Hailuo 2.3 for high-consistency video generation. Next, compare Stability AI's image and video pricing.
10. Stability AI

Stability AI stands out in image and audio workflows, where per-asset pricing often matters more than a monthly plan. It uses a credit system, and 1 credit = $0.01. New users get 25 free credits, which is enough for about 3 flagship generations or 8 SD 3.5 Large images. API access also includes commercial usage rights [48].
Here’s the per-service pricing.
Unit Pricing
| Service | Credits | USD |
|---|---|---|
| Stable Image Ultra | 8 | $0.08 |
| Stable Diffusion 3.5 Large | 6.5 | $0.065 |
| Stable Diffusion 3.5 Large Turbo | 4 | $0.04 |
| Stable Image Core | 3 | $0.03 |
| Stable Diffusion 3.5 Flash | 2.5 | $0.025 |
| SDXL 1.0 | From 0.9 | From $0.009 |
| Replace Background & Relight | 8 | $0.08 |
| Erase / Inpaint / Remove Background | 5 | $0.05 |
| Creative Upscaler (to 4K) | 60 | $0.60 |
| Fast Upscaler | 2 | $0.02 |
| Stable Fast 3D | 10 | $0.10 |
| Stable Audio 3.0 (up to 6 min) | 26 | $0.26 |
Included Limits
API pricing is pay-as-you-go, with custom pricing and bulk discounts for high-volume teams [49].
Model Coverage
Stability AI covers text-to-image, image editing, 3D asset generation, and audio generation [48]. In plain English, it’s built for production work. You can generate images, edit them, turn assets into 3D outputs, and make audio clips without jumping between a bunch of tools.
The editing suite includes outpainting, background replacement, relighting, and style transfer [48]. Stable Fast 3D handles 3D asset generation, while Stable Audio 3.0 supports audio clips up to six minutes [48]. So this is less about chat and more about getting media work done.
That pricing gap shows up most when you’re working at scale, especially with editing and upscaling jobs.
Cost-to-Output
The Creative Upscaler costs 60 credits ($0.60) per image. That’s 30x the price of the Fast Upscaler, which costs 2 credits ($0.02). So if your main goal is simple resolution increases, Fast Upscaler is the lower-cost pick [48].
Stable Image Core comes out to about $30/month for 1,000 images [48]. And if you scale to 10,000 images/month with SD 3.5 Large, the cost lands at about $650 [48].
You can also generate and edit images using other high-performance models. Next, compare Black Forest Labs' image pricing.
11. Black Forest Labs

Black Forest Labs is a handy pricing benchmark for image generation because the bill changes with output size and whether you use reference images. Its system is credit-based, with 1 credit = $0.01. FLUX.2 pricing is tied to megapixels, and reference images are charged on top. One thing to watch: each image and each reference image is rounded up to the next megapixel, based on 1,024 × 1,024 px.
Unit Pricing
The FLUX.2 lineup comes in four tiers: Max, Pro, Klein, and Flex. Each one makes a different tradeoff between image quality, speed, and price.
| Model | 1st MP (Base) | Add'l MP | Ref. Image (per MP) | Generation Mode |
|---|---|---|---|---|
| FLUX.2 [max] | $0.07 | $0.03 | $0.03 | Text-to-Image / Edit |
| FLUX.2 [pro] (Text-to-Image) | $0.03 | $0.015 | $0.015 | Text-to-Image |
| FLUX.2 [pro] (Edit) | $0.045 | $0.015 | $0.015 | Image Editing |
| FLUX.2 [klein] 9B | $0.015 | $0.002 | $0.002 | Text-to-Image / Edit |
| FLUX.2 [klein] 4B | $0.014 | $0.001 | $0.001 | Text-to-Image / Edit |
| FLUX.2 [flex] | $0.05 | $0.05 | $0.05 | Text-to-Image / Edit |
Older FLUX1.1 and FLUX.1 models use flat per-image pricing instead.
| Model | Price per Image | Description |
|---|---|---|
| FLUX1.1 [pro] | $0.04 | Standard high-speed generation |
| FLUX1.1 [pro] Ultra | $0.06 | Ultra-high-resolution |
| FLUX1.1 [pro] Raw | $0.06 | Candid photography aesthetic |
| FLUX.1 Kontext [max] | $0.08 | Maximum quality in-context editing |
| FLUX.1 Kontext [pro] | $0.04 | Commercial-ready in-context editing |
| FLUX.1 Fill [pro] | $0.05 | Targeted image inpainting |
| FLUX.1 [schnell] | $0.003 | Distilled for maximum speed |
Included Limits
API access is pay-as-you-go, but Black Forest Labs also has subscription tiers with monthly image caps [50].
| Plan | Monthly Limit | Key Features |
|---|---|---|
| Builder | 10,000 images/month | Klein models, 10 users, fine-tuning rights |
| Platform | 100,000 images/month | Klein 9B + Dev models, 10 users |
| Professional | 100,000 images/month | Dev models, 3 domains, 10 users |
| Enterprise | Custom | All models, custom volume, API and weights access |
Model Coverage
Black Forest Labs is centered on image generation and editing. FLUX.2 models support output sizes up to 4 MP, and anything above that gets resized automatically [50]. If speed matters most, FLUX.2 [klein] 4B stands out with sub-second inference, which makes it a good fit for near real-time use cases [52].
For editing work, the lineup also has a couple of clear options. FLUX.1 Fill [pro] handles targeted inpainting at $0.05 per image, while FLUX.1 Kontext [pro] is priced at $0.04 per image for commercial-ready in-context editing [51].
Cost-to-Output
A finished 4 MP FLUX.2 [max] image costs about $0.30, once you factor in generation, upscaling, and two retries. Reference images are billed separately at the same per-megapixel rate [50][51]. If you're doing concept art or early-stage prototyping, FLUX.2 [klein] 4B at $0.014 per image is the low-cost way to test ideas before you move to final renders [50].
Next: Kling AI video pricing.
12. Kling AI

Kling AI splits pricing into two lanes: the web app uses credits, while the API charges by the second. On the API side, cost changes based on clip length, resolution, and whether you turn on synchronized audio.
Unit Pricing
For standard silent video, pricing starts at $0.0672/sec at 720p and goes up to $0.0896/sec at 1080p. Kling V3 Omni, which handles text-plus-image inputs and video-to-video workflows, costs $0.1792/sec at 1080p.
| Configuration | Resolution | Price/Sec | Est. 10s Clip Cost |
|---|---|---|---|
| Kling V3 – Silent | 720p | $0.0672 | $0.67 |
| Kling V3 – Silent | 1080p | $0.0896 | $0.90 |
| Kling V3 – With Audio | 1080p | $0.1120 | $1.12 |
| Kling V3 Omni (Ref) | 1080p | $0.1792 | $1.79 |
| Kling V3 – Silent | 4K | $0.4286 | $4.29 |
So yes, Kling sits on the lower-priced side for video APIs.
Included Limits
Kling keeps web app and API pricing separate, which means you need to check both before picking a plan. The API rate is only one piece of the math. Credits and concurrency have a big effect on how much work you can push through.
The free tier comes with 66 credits per day, and those credits reset every 24 hours with no rollover. Paid plans begin at $6.99/month for 660 credits on Standard and go up to $180/month for 26,000 credits on Ultra. If you pay annually for Ultra, the effective rate drops by 34% [54].
For API users, standard concurrency is capped at 10 parallel jobs. Trial-tier accounts get only 3. That gap can matter a lot if you're trying to batch renders instead of waiting on clips one by one.
Model Coverage
Kling V3 and Kling V3 Omni support clips up to 15 seconds, which makes them a fit for cinematic and narrative work. V2.6 caps clip length at 10 seconds and adds synchronized audio. V2.5 Turbo is about 30% cheaper than the Master tier.
Cost-to-Output
A common way to keep spend under control is to draft in 720p silent mode and move up to 1080p or 4K only for final renders. That approach helps because many users need 2–4 generation attempts to get a usable clip, and that pushes up the cost of the finished video [53].
Prepaid Resource Packages can trim the effective unit price by 10% to 30%, depending on bundle size [53].
Next, compare these models by unit price, plan limits, modality coverage, and cost per output.
Pricing Breakdown by Comparison Criteria
The tables below condense the earlier provider-by-provider details into four buying filters: unit price, plan limits, modality coverage, and output cost.
Unit Pricing Across Text, Image, and Video APIs
| Model | Provider | Input ($/1M tokens) | Output ($/1M tokens) | Tier |
|---|---|---|---|---|
| GPT-5 Nano | OpenAI | $0.05 | $0.40 | Budget |
| Gemini 2.5 Pro | $1.25 | $10.00 | Mid-range | |
| GPT-5.5 | OpenAI | $5.00 | $30.00 | Premium |
For image generation, FLUX.1 [schnell] is the low-cost reference at $0.003 per image, while Stable Image Ultra sits at the top end at $0.08 per image. For video, Kling V3 costs $0.0672/sec at 720p on the low end, and Veo 3.1 comes in at $0.40/sec on the high end.
Raw rates matter. But in practice, plan limits often decide what you actually spend.
Included Limits in Subscriptions and Platform Plans
At under about 5 million input tokens per month, $20 chat plans can beat API billing for casual use.
| Provider | Plan | Monthly Price | Included Usage | Key Limits | Team Plan |
|---|---|---|---|---|---|
| OpenAI | ChatGPT Plus | $20 | Capped (dynamic) | Dynamic message caps; no API access | Yes |
| Anthropic | Claude Pro | $20 | Capped (dynamic) | Usage limits vary by demand; no API access | Yes |
| Gemini Advanced | $20 | Capped (dynamic) | Tied to Google One; no API access | Yes (Workspace) |
Reasoning models also add a wrinkle: hidden reasoning tokens are billed at output rates, which can push total cost up by 2x to 7x.[3]
Model Coverage by Modality
Pricing only matters once the model’s modality fits the job.
| Provider | Text | Multimodal Input | Image Gen | Vision | Video Gen | API Access |
|---|---|---|---|---|---|---|
| APIMart | ✓ | ✓ | ✓ | - | ✓ | Unified API |
| OpenAI | ✓ | ✓ | ✓ | ✓ | ✓ | Direct |
| ✓ | ✓ | ✓ | ✓ | ✓ | Direct | |
| Anthropic | ✓ | ✓ | ✗ | ✓ | ✗ | Direct |
| Meta | ✓ | ✓ | ✗ | ✓ | ✗ | Unified/Hosted |
| Mistral AI | ✓ | ✓ | ✗ | ✓ | ✗ | Direct |
| Stability AI | ✗ | ✗ | ✓ | ✗ | ✓ | Direct |
A cheap model is no bargain if it can’t handle the format you need. Text-only vendors, for example, won’t help much if your workflow depends on image or video output.
Cost-to-Output by Common Use Case
These are the costs teams tend to feel once things hit production.
Text workloads (per 1M output tokens):
| Use Case | Model | Output Cost | Tier | Key Tradeoff |
|---|---|---|---|---|
| High-volume chatbot | GPT-5 Nano | $0.40 | Budget | Lower reasoning depth |
| Document extraction | Gemini Flash Lite | $0.30 | Budget | Limited creative writing |
| Code generation | Gemini 2.5 Pro | $10.00 | Mid-range | Surcharge above 200K context [3] |
| Agentic workflows | Claude Sonnet 4.6 | $15.00 | Mid-range | Needs prompt caching for ROI [3] |
| Complex reasoning | Claude Opus 4.8 | $25.00 | Premium | High cost; slower latency |
Video workloads (per 10-second clip):
| Use Case | Model | Output Cost | Tier | Key Tradeoff |
|---|---|---|---|---|
| Short-form video (draft) | Kling V3 | ~$0.67 | Budget | 720p; limited to 15-second clips |
| Short-form video (final) | Sora 2 | $1.00 | Mid-range | Balanced quality and cost |
| Cinematic video | Veo 3.1 | $4.00 | Premium | Highest quality; highest spend |
Here’s the simple version: the price per token or per second is only part of the story. The bigger factor is often how you use the model. A chatbot running all day, a document pipeline, and a video studio can look cheap on paper and expensive fast once output volume kicks in.
A practical rule of thumb: batch processing cuts costs by 50% on OpenAI, Anthropic, and Mistral for workloads that can tolerate a 24-hour turnaround.[3] For video, drafting at lower resolution and upgrading only final renders is the most reliable way to control per-output spend.
Pros and Cons
The table below boils the tradeoffs down to the stuff that usually drives the decision: cost, modality, and workload fit. If you're choosing between providers, this gives you the short version without making you dig back through every pricing section.
| Subject | Pros | Cons | Best For |
|---|---|---|---|
| APIMart | 500+ models under one API; one invoice for text, image, and video | Usage-based pricing means costs rise with output volume | Teams that want unified multi-modal access |
| OpenAI | Clear token billing | Flagship models are expensive | General-purpose text workloads |
| Anthropic | Prompt caching lowers repeated-work costs | Top-tier models carry high output rates | Coding and long-context workflows |
| Google AI | Flash-Lite is cheap | Pro gets costly above 200K tokens | High-volume text and long-context workloads |
| Meta (Llama) | Low-cost if you can self-host | No first-party API means you handle hosting and uptime | Cost-sensitive workloads with self-hosting capability |
| xAI (Grok) | Competitive mid-tier pricing | Smaller model lineup | Real-time web and social-data applications |
| Mistral AI | Low-cost small models and multilingual coverage | Fewer multimodal features | Multilingual text apps |
| Cohere | Embed, Rerank, and Command R7B fit RAG | Command R+ is pricey for its tier | Retrieval-augmented generation and knowledge bases |
| Stability AI | Very low image-generation prices | Image-only scope limits broader workflows | High-volume image generation |
| Kling AI | Low-cost short-form video | Limited to 15-second videos at base pricing | Short-form video generation |
A simple way to read this:
- If you want one API for many model types, APIMart stands out.
- If you care most about plain text usage and straightforward billing, OpenAI or Google AI may be the easier fit.
- If your work leans toward coding, long prompts, or repeated context, Anthropic can make sense.
- If you're keeping costs down and can run things yourself, Meta (Llama) is hard to ignore.
- If your stack is built around RAG, Cohere has tools that line up well with that setup.
For image-heavy use, Stability AI is the low-cost pick. For short video clips, Kling AI keeps entry costs down, though the base plan stays tied to 15-second outputs.
Conclusion
Looking at the pricing breakdown above, the best model isn’t the most expensive one or the cheapest one. It’s the one that fits your workload, modality, and volume.
High-volume, low-complexity tasks should run on the lowest-cost models you can get away with.
As complexity goes up, spend should go up only when the output earns it. Mid-tier models are a good fit for production apps that need steady performance without the top-shelf price tag.
Once you get into premium reasoning or media generation, cost per output starts to matter more than raw token pricing. Premium models make sense when quality has a direct effect on results. And for video, pricing works differently: APIs like WAN 2.7, Sora 2 ($0.08/sec) and Kling V3 ($0.0672/sec at 720p) charge by the second, not by the token.
For teams using text, image, and video models together, APIMart gives access to 500+ models through a single API. That means multimodal work can sit under one API and one invoice.
FAQs
How do I estimate total cost per output?
Estimate the total cost based on how the model is billed.
For text models, pricing is usually split into input tokens and output tokens per 1 million tokens. Output tokens often cost more, so your expected response length has the biggest effect on total spend.
For non-text use cases, image models are often priced per call, while video models are priced per second generated.
A simple way to estimate cost is to:
- use a token counter to measure prompt volume
- check the model’s rate for each billing unit
- apply that rate to your expected usage
That gives you a practical cost estimate before you scale anything up.
When does prompt caching save money?
Prompt caching cuts costs when your app sends the same prompt prefix again and again. That usually means long system instructions, large document sets, or shared conversation history reused across many requests.
Instead of paying the full input token price every time, you pay less for the repeated part. In many cases, that can reduce input costs by 50% to 90%.
This works best when volume is high and the context stays mostly the same. A customer support chatbot is a good example: the bot may reuse the same rules, brand info, and help docs across thousands of chats.
It’s a poor fit when the context changes all the time. If your app keeps rewriting the prompt from scratch on each request, there’s less repeated text to cache, so the savings drop fast.
Should I use subscriptions or API pricing?
For most developers and businesses, API pricing makes more sense. With pay-as-you-go billing, you pay for the tokens you use - no monthly minimums, no surprise fees, and no fixed charges hanging over you when traffic is light. Your costs move with usage, which is often a much better fit than a flat recurring bill.
APIMart gives you one API that connects to 500+ AI models, with clear per-token pricing and automatic volume discounts as your usage goes up.
Related Blog Posts
Choose the model you want in the model marketplace
Try chat, image and video models in the APIMart model marketplace, and experience model capabilities quickly with one unified API.