LLM Pricing Guide — Compare 500+ AI Models

Compare LLM and media API pricing across 500+ models from OpenAI, Anthropic, Google, Meta, xAI, Mistral, and more — by token, image, and video cost.

Model Insights

AI spend can get expensive fast: U.S. companies now average $85,500 per month on AI. My main takeaway is simple: the cheapest model on paper is not always the lowest-cost option once you factor in output length, context size, retries, tool fees, and plan limits.

If I were choosing from this guide, I’d look at four things first:

Unit price: tokens, images, or video seconds
Limits: RPM, TPM, context windows, and plan caps
Modality: text, image, audio, video, and vision support
Cost per finished output: not just the sticker price

The article compares APIMart, OpenAI, Anthropic, Google AI, Meta, xAI, Mistral, Cohere, Runway, Stability AI, Black Forest Labs, and Kling AI.

A few clear patterns stand out:

Text pricing varies a lot. OpenAI ranges from $0.05 to $30.00 per 1M input tokens.
Output is often much more expensive than input. In some cases, output runs 4x to 6x higher.
Long context can change the bill. OpenAI adds higher pricing after 270,000 tokens, and Google changes rates above 200,000 tokens.
Batch jobs can cut costs. OpenAI, Anthropic, and Mistral list 50% discounts for async processing.
Video costs stack up with retries. A cheap per-second rate can still turn into a much higher cost per finished clip.
Unified billing matters for teams using many formats. APIMart’s pitch is one API and one invoice across 500+ models.

If you want the short version:
use budget models for high-volume text, mid-tier models for production apps, premium models only when output quality changes business results, and draft low-res video before paying for final renders.

What are LLM Tokens and API Prices? (Beginner Friendly)

Quick Comparison

AI Model Pricing Comparison: Cost Per 1M Tokens by Provider (2026)

Provider	Main Strength	Watch Out For	Best Fit
APIMart	One API for 500+ models across text, image, video, and audio	Usage costs still scale with volume	Teams that want one billing setup
OpenAI	Broad model range and clear token pricing and cost tips	Top models get expensive fast	General text, image, audio, video
Anthropic	Strong for coding and long context	Output rates are high	Agents, coding, long prompts
Google AI	Low-cost Flash options and large context	Higher rates above 200K tokens	High-volume text and multimodal apps
Meta	Very low hosted or self-hosted Llama pricing	Pricing and limits depend on host	Cost-focused teams with hosting options
xAI	Lower spread between input and output pricing	Tool calls add extra fees	Long-response and tool-use apps
Mistral	Low token prices and batch discounts	Some tools cost extra	Utility text, coding, EU-based use
Cohere	Good fit for RAG, embeddings, and rerank	Less suited for media generation	Search, retrieval, knowledge bases
Runway	Video-first platform with clear credit math	Retries can drive up finished cost	Video creation and editing
Stability AI	Low image pricing and editing tools	Narrower scope than text vendors	High-volume image and audio work
Black Forest Labs	Fine-grained image pricing by size	Costs rise with retries and references	Image generation and editing
Kling AI	Lower-cost short video generation	Clip length and concurrency limits	Short-form video

So before I compare prices line by line, I’d start with one question: What am I paying for most - tokens, images, seconds, or retries?

1. APIMart

GccAi

APIMart uses pay-as-you-go billing with no monthly minimums and no hidden fees. Pricing changes based on modality, so text, image, video, and audio aren’t billed the same way.

Unit Pricing

Pricing varies by modality, as the table below shows.

Modality	Billing Unit	Example Model	APIMart Price
Text	Per 1M tokens	Qwen2.5-VL-72B	$20.00
Image	Per call	GPT Image 2	$0.006
Image	Per call	Wan 2.7 Image	$0.0216
Video	Per second	Sora 2	$0.08
Video	Per second	Kling V3 (720p)	$0.0672

Image generation costs can shift a lot depending on the quality tier. For example, a 1024×1024 GPT-Image-2-Official call costs about $0.00488 at Low quality, $0.04232 at Medium, and $0.16872 at High. That gap adds up fast. If top-end output isn’t required, using a lower tier can cut per-call spend.

Included Limits

Default accounts come with RPM and TPM limits. Enterprise accounts can request higher-throughput channels.

Model Coverage

APIMart supports text, image, video, and audio models through one API. That includes models such as GPT-5, Claude, Sora 2, Midjourney, and Kling V3.

Cost-to-Output

The main upside here is consolidated billing across modalities. Instead of juggling separate bills for text, image, and video, you get one setup that makes spend control easier.

Next, the guide compares how major providers structure pricing across text, image, and video models.

2. OpenAI

OpenAI

OpenAI uses a pay-per-token pricing model for text. And the gap between models is huge.

As of June 2026, pricing starts at $0.05 per 1M input tokens for GPT-5 nano and goes all the way up to $30.00 per 1M input tokens for GPT-5.5 Pro ^[3]^[5]. The easiest way to read OpenAI pricing is by model tier, because input, output, and cached-token rates can be very different from one model to the next.

Unit Pricing

Model	Input (per 1M)	Cached Input	Output (per 1M)
GPT-5.5 Pro	$30.00	-	$180.00
GPT-5.5 (Standard)	$5.00	$0.50	$30.00
GPT-5.4	$2.50	$0.25	$15.00
GPT-5.4 mini	$0.75	$0.075	$4.50
GPT-5.4 nano	$0.20	$0.02	$1.25
GPT-5 nano	$0.05	$0.005	$0.40

Across OpenAI models, output tokens cost 4 to 6 times more than input tokens ^[6]. That matters a lot when your app produces long answers, summaries, or agent-style responses. OpenAI also offers Batch and Flex tiers with a flat 50% discount across all models, so GPT-5.5 input falls from $5.00 to $2.50 per 1M tokens ^[5].

Costs can climb again when long-context usage hits surcharge pricing.

Included Limits

OpenAI doubles both input and output rates once total context goes past 270,000 tokens ^[3]^[5]. If you're working with long document review or multi-turn agent loops, rolling summarization is one of the simplest ways to stay under that line.

OpenAI uses this same pricing setup across image, audio, and video models too.

Model Coverage

OpenAI prices image, audio, and video generation separately. Sora-2 costs $0.10 per second for 720p video, while Sora-2-pro at 1080p costs $0.70 per second at the standard rate ^[5].

For other media:

Image pricing ranges from $2.50 to $8.00 per 1M tokens
Whisper transcription costs $0.006 per minute
TTS (tts-1) costs $0.015 per 1,000 characters ^[3]^[4]

Cost-to-Output

One of the fastest ways to cut spend is simple: send lower-value work to cheaper models. Processing 10,000 support tickets costs about $16 with GPT-4.1, $3.20 with GPT-4.1 mini, and $0.80 with GPT-4.1 nano ^[4].

3. Anthropic

Anthropic

Anthropic splits its Claude API lineup into four pricing tiers: Frontier/Research (Claude Fable 5 / Mythos 5), Flagship (Claude Opus 4.5–4.8), Mid-tier (Claude Sonnet 4.5–4.6), and Budget (Claude Haiku 4.5) ^[9]^[10]. The pattern is pretty clear. As you move up the tiers, you get more reasoning depth and better output, but the bill climbs fast too. For most buyers, the choice comes down to this: Haiku is the low-cost option, Sonnet is the middle ground, and Opus/Fable are built for heavier jobs.

Unit Pricing

Anthropic prices output tokens at 5x the input rate across the current tiered lineup ^[7]^[6]. Prices below are in USD per 1 million tokens ^[13]^[15].

Model	Input (per 1M)	Cache Read (per 1M)	Output (per 1M)
Claude Fable 5 / Mythos 5	$10.00	$1.00	$50.00
Claude Opus 4.8	$5.00	$0.50	$25.00
Claude Sonnet 4.6	$3.00	$0.30	$15.00
Claude Haiku 4.5	$1.00	$0.10	$5.00

Prompt caching can trim costs when you're reusing the same prefix again and again. Cache writes cost 1.25x the base input rate for a 5-minute TTL, or 2x for a 1-hour TTL. Cache reads cost 10% of standard input. In practice, caching starts to make sense after four or more uses of the same prefix ^[3]^[9]^[12].

Included Limits

Most current flagship and mid-tier Anthropic models - including Fable 5, Mythos 5, Opus 4.6–4.8, and Sonnet 4.6 - come with a 1M-token context window at standard pricing ^[9]^[11]. Claude Haiku 4.5 goes up to 200,000 tokens. Rate limits follow a tiered setup, from Tier 1 through Enterprise, with RPM and TPM caps set by plan ^[13]^[15].

Model Coverage

Anthropic models handle text and vision inputs, and Computer Use adds extra token overhead. Some add-ons are billed separately:

Web search costs $10 per 1,000 searches
Managed Agents cost $0.08 per active session-hour, plus token charges

The Batch API cuts token costs by 50% for async jobs with a 24-hour turnaround ^[8]^[9]^[11].

Cost-to-Output

This is where pricing gets practical: which tier stays cost-effective for repeat tasks, long-context work, and agent flows?

A one-hour coding session on Claude Opus 4.8, using 50,000 input tokens with 40,000 as cache reads and 15,000 output tokens, costs about $0.525, including the $0.08 agent session fee ^[9]^[12]. That gives you a decent picture of how Anthropic pricing behaves in actual use, not just on a pricing table.

For production jobs like coding assistants and multi-step agents, Claude Sonnet 4.6 tends to offer the best balance between cost and capability ^[6]^[3].

Next, compare Google AI's pricing across Gemini text and multimodal models.

4. Google AI

Google AI

Google prices its models based on the model you pick and the size of the context window. One pricing rule matters right away: keep prompts under 200,000 tokens if you want to avoid the higher rate ^[14]^[3].

Unit Pricing

Model	Input (per 1M)	Output (per 1M)	Context Window
Gemini 3.1 Pro (≤200K)	$2.00	$12.00	1M–2M
Gemini 3.1 Pro (>200K)	$4.00	$18.00	1M–2M
Gemini 2.5 Pro (≤200K)	$1.25	$10.00	2M
Gemini 3.5 Flash	$1.50	$9.00	1M
Gemini 3 Flash	$0.50	$3.00	1M
Gemini 2.5 Flash	$0.30	$2.50	1M
Gemini 3.1 Flash-Lite	$0.25	$1.50	1M
Gemini 2.5 Flash-Lite	$0.10	$0.40	1M
Gemini 3 4B	$0.04	$0.08	131K

For image generation, Imagen 4 Fast starts at $0.01–$0.02 per image, while Imagen 4 Ultra costs $0.06 per image. Veo 3.1 video is billed by the second. Standard runs at $0.40 per second for 720p and 1080p, and Light runs at $0.05–$0.08 per second ^[17].

Included Limits

Model price is only part of the story. Throughput limits and data settings can shift your total spend in a big way.

Google’s main tradeoff is pretty clear: low model pricing on one side, and throughput limits plus data restrictions on the other. In Google AI Studio, the free tier gives you about 15 RPM, free tokens, and product-improvement data use. The paid tier jumps to about 1,000–2,000 RPM, turns off product-improvement data use, and adds context caching plus Batch API. Enterprise plans add provisioned throughput, volume discounts, and compliance features ^[17]^[18].

Context caching is one of the biggest cost levers here. Writing to the cache is free, and reading from it costs 25% of the standard input rate ^[18].

Model Coverage

Google’s lineup spans text, multimodal, image, and video models. It also supports image input, starting at $0.0025 per image ^[3].

Cost-to-Output

For chatbots, summarization, and classification, Gemini 2.5 Flash or Gemini 3.1 Flash-Lite will usually make the most sense. They’re cheaper and fit many day-to-day workloads well. Save Gemini 3.1 Pro for cases where you need the larger context window, and use rolling summarization to stay under the 200,000-token cutoff ^[3]^[6].

There’s also a simple price angle worth noting. At $2.00 per 1M input tokens, Gemini 3.1 Pro is cheaper than flagship models such as GPT-5.5 ($5.00) and Claude Opus 4.8 ($5.00) for standard-length prompts ^[2].

Next, compare Meta's pricing and model coverage.

5. Meta

Meta works a bit differently from the closed-model providers above. Its Llama models are open weight, so your cost depends on where you host or access them. In practice, that means the exact same model can have very different pricing from one provider to another. For example, Llama 3.3 70B has been listed as low as $0.10 per 1M input tokens. Meta also doesn't publish a first-party API pricing sheet, which is why pricing can swing so much across hosts ^[1]^[19]^[20].

Unit Pricing

Current pricing is centered on Llama 4 Scout and Llama 4 Maverick ^[21]^[22].

Model	Input (per 1M)	Output (per 1M)	Context Window
Llama 4 Scout	$0.08 – $0.17	$0.15 – $0.66	Up to 10,000,000 tokens
Llama 4 Maverick	$0.15 – $0.24	$0.60 – $0.97	1,000,000 tokens
Llama 3.3 70B	$0.10 – $0.72	$0.32 – $0.72	128K – 131K tokens
Llama 3.2 1B Instruct	$0.01 – $0.02	$0.01 – $0.02	60K – 131K tokens
Llama 3.1 405B Instruct	$0.90 – $3.00	$0.90 – $3.00	128K – 131K tokens

That low sticker price looks great on paper. But it only helps if the host's context limits and throughput caps match your workload.

Included Limits

There isn't one standard Meta plan with shared rate limits or a built-in free tier. Hosts set their own throughput limits, context caps, and caching rules. So if you're planning long-context work with Llama 4 Scout, check the host's context ceiling first instead of assuming you'll get the full advertised range.

Model Coverage

Llama 4 Scout and Llama 4 Maverick both support text and vision input, plus tool calling and JSON mode across major providers ^[21]^[22]. Older options still have their place too. Llama 3.2 11B Vision can still handle vision-heavy jobs, while Llama 3.2 1B Instruct is aimed at edge deployments where low latency and lean compute use matter most ^[21]^[22].

Cost-to-Output

If you're running high-volume jobs with long prompts, Scout stands out. A coding task with 40K input and 8K output tokens costs about $0.005 per task, which works out to roughly 200 tasks per $1. That same task on GPT-5.5 costs about $0.44, or only 2.3 tasks per $1 ^[6].

For customer-facing use or multimodal work, Llama 4 Maverick is usually the better match. It benchmarks above GPT-4o while costing far less on input pricing: $0.15/M input versus $2.50/M input ^[6]. Its smaller gap between input and output pricing also makes it a good fit when you expect longer responses.

Next, compare xAI's pricing and model coverage.

6. xAI

xAI

xAI keeps input and output pricing low, which helps when responses get long. Grok 4.3 charges $1.25 per 1 million input tokens and $2.50 per 1 million output tokens. That 2x spread matters when a model writes a lot back to you ^[6]^[24].

Unit Pricing

Model	Input (per 1M)	Cached Input	Output (per 1M)	Context Window
Grok 4.3 (Flagship)	$1.25	$0.20	$2.50	1M tokens
Grok 4.20 (Reasoning)	$1.25	$0.20	$2.50	2M tokens
Grok Build 0.1 (Coding)	$1.00	$0.20	$2.00	256K tokens
Grok 4.1 Fast (Budget)	$0.20	$0.05	$0.50	2M tokens

For image work, Grok Imagine 1.5 Edit costs $0.01875 per call ^[23]. Video generation through the Imagine API runs from $0.08 to $0.25 per second, based on resolution ^[24].

Included Limits

xAI uses pay-as-you-go billing with usage-based rate limits. Enterprise plans can add custom rate limits and dedicated infrastructure ^[25]^[26].

There’s one thing to watch: tool use can add up fast. Search and code execution are billed separately, so a low token price doesn’t always mean a low final bill. Web Search, X Search, and Code Execution each cost $5.00 per 1,000 calls ^[24].

If your jobs aren’t urgent, the Batch API can trim costs by 20% to 50% for tasks processed within 24 hours ^[24].

Model Coverage

xAI covers text, image, and video use cases ^[24]^[16]. Grok 4.20 is built for faster tool use, while Grok Build 0.1 is aimed at coding-heavy work ^[6]^[2].

Cost-to-Output

For a standard coding task with 40K input tokens and 8K output tokens, Grok 4.3 costs about $0.07 per task. That works out to roughly 14 tasks per $1 ^[6].

Next, the guide compares another provider's pricing structure, or you can use a unified LLM API to access these models through a single integration.

7. Mistral AI

Mistral AI

Mistral Large 3 costs $0.50 per 1M input tokens and $1.50 per 1M output tokens. That puts it in the low-priced flagship camp. The headline rates look strong, but the final bill can shift once you factor in tool charges and Mistral's billing tiers.

Here’s the current lineup.

Unit Pricing

Model	Input (per 1M)	Cached Input	Output (per 1M)
Mistral Large 3 (Flagship)	$0.50	$0.05	$1.50
Mistral Medium 3.5 (Balanced)	$1.50	-	$7.50
Mistral Small 4 (Efficient)	$0.10	$0.01	$0.30
Magistral Medium (Reasoning)	$2.00	-	$5.00
Codestral (Coding)	$0.30	$0.03	$0.90
Devstral 2 (Coding)	$0.40	$0.04	$2.00
Pixtral Large (Multimodal)	$2.00	-	$6.00

Mistral also charges separately for OCR at $4.00 per 1,000 pages, embeddings at $0.10 per 1M tokens, and web search plus code execution at $30 per 1,000 calls ^[27].

Included Limits

Mistral uses pay-as-you-go billing, with four rate-limit tiers that open up at $20, $100, and $500 in cumulative spend ^[31]. The Team plan comes with a minimum commitment of $50 per month ^[27].

There are two levers here that can cut costs fast:

The Batch API lowers all model prices by 50% for async jobs ^[27]^[31].
Prompt caching can reduce cached token costs by up to 90% when the shared prefix is at least 64 tokens long ^[31].

These pricing rules matter most when you're running high-volume or async workloads.

Model Coverage

Mistral covers text, reasoning, coding, multimodal, and edge use cases. Codestral supports fill-in-the-middle (FIM), which makes it a good match for IDE workflows. The Ministral series - 3B, 8B, and 14B - is aimed at low-cost or on-device deployments ^[30]^[32].

Mistral also offers EU-hosted endpoints and GDPR-friendly data processing at no extra charge ^[29]^[31].

Cost-to-Output

For high-volume utility work like entity extraction, classification, and summarization, Mistral Small 4 is the strongest fit ^[28]^[31]. If you need more reasoning power but still want low token pricing, Mistral Large 3 makes more sense ^[28]^[31].

For reasoning-heavy tasks, Magistral Medium costs $5.00 per 1M output tokens and is 37% cheaper on output than OpenAI's o3 ^[29].

Next, compare Cohere's pricing for enterprise text and retrieval workloads.

8. Cohere

Cohere

Cohere is built mainly for retrieval and enterprise search. Its pricing reflects that: lower-cost options for text-heavy retrieval work, and higher-priced models for multimodal or more demanding jobs.

Unit Pricing

Cohere splits its lineup into three buckets: low-cost retrieval models, enterprise multimodal models, and separate tools for embeddings and reranking.

Model	Input (per 1M)	Output (per 1M)	Context Window
Command R7B	$0.0375	$0.15	128K
Command R	$0.15	$0.60	128K
Command R+	$2.50	$10.00	128K
Command A (Multimodal)	$2.50	$10.00	256K
Aya Expanse (8B/32B)	$0.50	$1.50	128K
Embed v3	$0.10	-	-
Rerank v3	$2.00	-	-

Rerank is billed using search units: one query plus up to 100 documents. If chunks go past 500 tokens, they count separately ^[33]^[35].

Included Limits

Cohere gives you free trial keys for testing, capped at 1,000 calls per month and 20 RPM ^[37]. Production keys use pay-as-you-go pricing, with up to 500 RPM for standard models. Billing happens at month-end or once your balance hits $250 ^[33]^[37].

Some of Cohere's top-end models need sales approval before full production use. That includes Command A+, A Reasoning, and A Vision. Until that approval happens, self-serve access stays at trial-style limits ^[37].

If your team needs dedicated throughput, Cohere also offers Model Vault. Pricing starts at $2,500 per month for an Embed 4 Small instance ^[33].

Model Coverage

Cohere fits text-first enterprise workflows, not media generation.

The company centers its lineup around text, retrieval, and enterprise search instead of image or video generation. The main exception is Command A, which supports image inputs and multimodal tasks. It also comes with a 256,000-token context window, the largest in Cohere's lineup ^[34]^[36].

Aya Expanse supports 49 languages, which makes it a solid pick for global deployments ^[37].

Cost-to-Output

If you're building RAG pipelines, Cohere's low input pricing is the big draw. These workflows usually burn through far more input tokens than output tokens, so Command R at $0.15 per 1M input tokens helps keep document-heavy prompts from getting too expensive.

A simple example makes the gap clear: running 100,000 support chatbot interactions on Command R costs about $123 per month, while the same volume on Command R+ comes out to roughly $2,050 per month ^[39].

For pure classification and summarization at scale, Command R7B is the lowest-cost option in the lineup ^[34]^[38].

A practical way to think about it:

Use Command R7B for high-volume classification and summarization.
Use Command R for RAG and chatbots.
Use Command R+ only when you need the extra model strength.

Next, compare Runway's pricing or explore cinematic AI video generation alternatives.

9. Runway

Runway

Runway is built around video, so its pricing is tied to seconds generated or edited. It uses a credit system for video and image work. You get credits through a subscription or by buying top-up packs at $0.01 per credit, with a $10 minimum. API credits are billed on their own. The main thing to watch is how the credit burn changes from one model to another.

Unit Pricing

Model	API Rate (Credits/sec)	USD Cost/sec
Gen-4.5 (Flagship)	12 /sec	$0.12
Gen-4 Video	12 /sec	$0.12
Gen-4 Turbo	5 /sec	$0.05
Aleph 2.0 (Video Editing)	28 /sec	$0.28
Act-Two (Animation)	5 /sec	$0.05
Gen-4 Image (1080p)	8 /img	$0.08

Included Limits

On annual plans ^[40]^[43], Runway includes the following monthly credit caps:

Plan	Monthly Cost (Annual)	Credits/Month	Rollover
Free	$0	125 (one-time)	None
Standard	$12	625	None
Pro	$28	2,250	None
Max	$76	9,500	1 month

The Free plan includes watermarks and does not allow commercial use. Paid plans remove both limits ^[40]^[44]. Standard and Pro credits do not roll over, and unused credits expire within 24 hours of the next billing date ^[40]^[43]^[46]. Only Max gives you one month of rollover ^[40]^[43]^[46].

Model Coverage

Runway covers text-to-video, image-to-video, video editing, text-to-image, image-to-image, plus audio and post-processing tools ^[42]. That range gives it more reach than a tool that only does generation. But price alone doesn't tell the whole story. Output quality changes what you end up paying in practice.

Cost-to-Output

This is where things get more expensive than they first look. Retries add up fast. Most finished clips take 3 to 5 generations, which pushes Gen-4.5 to about $0.50 to $0.80 per finished second ^[43]^[44]^[47].

A common way to keep spend in check is to use Gen-4 Turbo at $0.05/sec for rough drafts and concept tests, then move to Gen-4.5 at $0.12/sec for final renders ^[41]^[45]. That setup makes sense if you don't want to burn premium credits while you're still figuring out motion, framing, or timing.

There's also a hard ceiling on lower-tier plans. Standard's 625 credits cover only about 52 seconds of Gen-4.5 video per month ^[40]^[44]. That's enough for a handful of polished clips, but it won't carry a steady production workflow.

Alternatively, you can explore MiniMax Hailuo 2.3 for high-consistency video generation. Next, compare Stability AI's image and video pricing.

10. Stability AI

Stability AI

Stability AI stands out in image and audio workflows, where per-asset pricing often matters more than a monthly plan. It uses a credit system, and 1 credit = $0.01. New users get 25 free credits, which is enough for about 3 flagship generations or 8 SD 3.5 Large images. API access also includes commercial usage rights ^[48].

Here’s the per-service pricing.

Unit Pricing

Service	Credits	USD
Stable Image Ultra	8	$0.08
Stable Diffusion 3.5 Large	6.5	$0.065
Stable Diffusion 3.5 Large Turbo	4	$0.04
Stable Image Core	3	$0.03
Stable Diffusion 3.5 Flash	2.5	$0.025
SDXL 1.0	From 0.9	From $0.009
Replace Background & Relight	8	$0.08
Erase / Inpaint / Remove Background	5	$0.05
Creative Upscaler (to 4K)	60	$0.60
Fast Upscaler	2	$0.02
Stable Fast 3D	10	$0.10
Stable Audio 3.0 (up to 6 min)	26	$0.26

Included Limits

API pricing is pay-as-you-go, with custom pricing and bulk discounts for high-volume teams ^[49].

Model Coverage

Stability AI covers text-to-image, image editing, 3D asset generation, and audio generation ^[48]. In plain English, it’s built for production work. You can generate images, edit them, turn assets into 3D outputs, and make audio clips without jumping between a bunch of tools.

The editing suite includes outpainting, background replacement, relighting, and style transfer ^[48]. Stable Fast 3D handles 3D asset generation, while Stable Audio 3.0 supports audio clips up to six minutes ^[48]. So this is less about chat and more about getting media work done.

That pricing gap shows up most when you’re working at scale, especially with editing and upscaling jobs.

Cost-to-Output

The Creative Upscaler costs 60 credits ($0.60) per image. That’s 30x the price of the Fast Upscaler, which costs 2 credits ($0.02). So if your main goal is simple resolution increases, Fast Upscaler is the lower-cost pick ^[48].

Stable Image Core comes out to about $30/month for 1,000 images ^[48]. And if you scale to 10,000 images/month with SD 3.5 Large, the cost lands at about $650 ^[48].

You can also generate and edit images using other high-performance models. Next, compare Black Forest Labs' image pricing.

11. Black Forest Labs

Black Forest Labs

Black Forest Labs is a handy pricing benchmark for image generation because the bill changes with output size and whether you use reference images. Its system is credit-based, with 1 credit = $0.01. FLUX.2 pricing is tied to megapixels, and reference images are charged on top. One thing to watch: each image and each reference image is rounded up to the next megapixel, based on 1,024 × 1,024 px.

Unit Pricing

The FLUX.2 lineup comes in four tiers: Max, Pro, Klein, and Flex. Each one makes a different tradeoff between image quality, speed, and price.

Model	1st MP (Base)	Add'l MP	Ref. Image (per MP)	Generation Mode
FLUX.2 [max]	$0.07	$0.03	$0.03	Text-to-Image / Edit
FLUX.2 [pro] (Text-to-Image)	$0.03	$0.015	$0.015	Text-to-Image
FLUX.2 [pro] (Edit)	$0.045	$0.015	$0.015	Image Editing
FLUX.2 [klein] 9B	$0.015	$0.002	$0.002	Text-to-Image / Edit
FLUX.2 [klein] 4B	$0.014	$0.001	$0.001	Text-to-Image / Edit
FLUX.2 [flex]	$0.05	$0.05	$0.05	Text-to-Image / Edit

Older FLUX1.1 and FLUX.1 models use flat per-image pricing instead.

Model	Price per Image	Description
FLUX1.1 [pro]	$0.04	Standard high-speed generation
FLUX1.1 [pro] Ultra	$0.06	Ultra-high-resolution
FLUX1.1 [pro] Raw	$0.06	Candid photography aesthetic
FLUX.1 Kontext [max]	$0.08	Maximum quality in-context editing
FLUX.1 Kontext [pro]	$0.04	Commercial-ready in-context editing
FLUX.1 Fill [pro]	$0.05	Targeted image inpainting
FLUX.1 [schnell]	$0.003	Distilled for maximum speed

Included Limits

API access is pay-as-you-go, but Black Forest Labs also has subscription tiers with monthly image caps ^[50].

Plan	Monthly Limit	Key Features
Builder	10,000 images/month	Klein models, 10 users, fine-tuning rights
Platform	100,000 images/month	Klein 9B + Dev models, 10 users
Professional	100,000 images/month	Dev models, 3 domains, 10 users
Enterprise	Custom	All models, custom volume, API and weights access

Model Coverage

Black Forest Labs is centered on image generation and editing. FLUX.2 models support output sizes up to 4 MP, and anything above that gets resized automatically ^[50]. If speed matters most, FLUX.2 [klein] 4B stands out with sub-second inference, which makes it a good fit for near real-time use cases ^[52].

For editing work, the lineup also has a couple of clear options. FLUX.1 Fill [pro] handles targeted inpainting at $0.05 per image, while FLUX.1 Kontext [pro] is priced at $0.04 per image for commercial-ready in-context editing ^[51].

Cost-to-Output

A finished 4 MP FLUX.2 [max] image costs about $0.30, once you factor in generation, upscaling, and two retries. Reference images are billed separately at the same per-megapixel rate ^[50]^[51]. If you're doing concept art or early-stage prototyping, FLUX.2 [klein] 4B at $0.014 per image is the low-cost way to test ideas before you move to final renders ^[50].

Next: Kling AI video pricing.

12. Kling AI

Kling AI

Kling AI splits pricing into two lanes: the web app uses credits, while the API charges by the second. On the API side, cost changes based on clip length, resolution, and whether you turn on synchronized audio.

Unit Pricing

For standard silent video, pricing starts at $0.0672/sec at 720p and goes up to $0.0896/sec at 1080p. Kling V3 Omni, which handles text-plus-image inputs and video-to-video workflows, costs $0.1792/sec at 1080p.

Configuration	Resolution	Price/Sec	Est. 10s Clip Cost
Kling V3 – Silent	720p	$0.0672	$0.67
Kling V3 – Silent	1080p	$0.0896	$0.90
Kling V3 – With Audio	1080p	$0.1120	$1.12
Kling V3 Omni (Ref)	1080p	$0.1792	$1.79
Kling V3 – Silent	4K	$0.4286	$4.29

So yes, Kling sits on the lower-priced side for video APIs.

Included Limits

Kling keeps web app and API pricing separate, which means you need to check both before picking a plan. The API rate is only one piece of the math. Credits and concurrency have a big effect on how much work you can push through.

The free tier comes with 66 credits per day, and those credits reset every 24 hours with no rollover. Paid plans begin at $6.99/month for 660 credits on Standard and go up to $180/month for 26,000 credits on Ultra. If you pay annually for Ultra, the effective rate drops by 34% ^[54].

For API users, standard concurrency is capped at 10 parallel jobs. Trial-tier accounts get only 3. That gap can matter a lot if you're trying to batch renders instead of waiting on clips one by one.

Model Coverage

Kling V3 and Kling V3 Omni support clips up to 15 seconds, which makes them a fit for cinematic and narrative work. V2.6 caps clip length at 10 seconds and adds synchronized audio. V2.5 Turbo is about 30% cheaper than the Master tier.

Cost-to-Output

A common way to keep spend under control is to draft in 720p silent mode and move up to 1080p or 4K only for final renders. That approach helps because many users need 2–4 generation attempts to get a usable clip, and that pushes up the cost of the finished video ^[53].

Prepaid Resource Packages can trim the effective unit price by 10% to 30%, depending on bundle size ^[53].

Next, compare these models by unit price, plan limits, modality coverage, and cost per output.

Pricing Breakdown by Comparison Criteria

The tables below condense the earlier provider-by-provider details into four buying filters: unit price, plan limits, modality coverage, and output cost.

Unit Pricing Across Text, Image, and Video APIs

Model	Provider	Input ($/1M tokens)	Output ($/1M tokens)	Tier
GPT-5 Nano	OpenAI	$0.05	$0.40	Budget
Gemini 2.5 Pro	Google	$1.25	$10.00	Mid-range
GPT-5.5	OpenAI	$5.00	$30.00	Premium

For image generation, FLUX.1 [schnell] is the low-cost reference at $0.003 per image, while Stable Image Ultra sits at the top end at $0.08 per image. For video, Kling V3 costs $0.0672/sec at 720p on the low end, and Veo 3.1 comes in at $0.40/sec on the high end.

Raw rates matter. But in practice, plan limits often decide what you actually spend.

Included Limits in Subscriptions and Platform Plans

At under about 5 million input tokens per month, $20 chat plans can beat API billing for casual use.

Provider	Plan	Monthly Price	Included Usage	Key Limits	Team Plan
OpenAI	ChatGPT Plus	$20	Capped (dynamic)	Dynamic message caps; no API access	Yes
Anthropic	Claude Pro	$20	Capped (dynamic)	Usage limits vary by demand; no API access	Yes
Google	Gemini Advanced	$20	Capped (dynamic)	Tied to Google One; no API access	Yes (Workspace)

Reasoning models also add a wrinkle: hidden reasoning tokens are billed at output rates, which can push total cost up by 2x to 7x.^[3]

Model Coverage by Modality

Pricing only matters once the model’s modality fits the job.

Provider	Text	Multimodal Input	Image Gen	Vision	Video Gen	API Access
APIMart	✓	✓	✓	-	✓	Unified API
OpenAI	✓	✓	✓	✓	✓	Direct
Google	✓	✓	✓	✓	✓	Direct
Anthropic	✓	✓	✗	✓	✗	Direct
Meta	✓	✓	✗	✓	✗	Unified/Hosted
Mistral AI	✓	✓	✗	✓	✗	Direct
Stability AI	✗	✗	✓	✗	✓	Direct

A cheap model is no bargain if it can’t handle the format you need. Text-only vendors, for example, won’t help much if your workflow depends on image or video output.

Cost-to-Output by Common Use Case

These are the costs teams tend to feel once things hit production.

Text workloads (per 1M output tokens):

Use Case	Model	Output Cost	Tier	Key Tradeoff
High-volume chatbot	GPT-5 Nano	$0.40	Budget	Lower reasoning depth
Document extraction	Gemini Flash Lite	$0.30	Budget	Limited creative writing
Code generation	Gemini 2.5 Pro	$10.00	Mid-range	Surcharge above 200K context ^[3]
Agentic workflows	Claude Sonnet 4.6	$15.00	Mid-range	Needs prompt caching for ROI ^[3]
Complex reasoning	Claude Opus 4.8	$25.00	Premium	High cost; slower latency

Video workloads (per 10-second clip):

Use Case	Model	Output Cost	Tier	Key Tradeoff
Short-form video (draft)	Kling V3	~$0.67	Budget	720p; limited to 15-second clips
Short-form video (final)	Sora 2	$1.00	Mid-range	Balanced quality and cost
Cinematic video	Veo 3.1	$4.00	Premium	Highest quality; highest spend

Here’s the simple version: the price per token or per second is only part of the story. The bigger factor is often how you use the model. A chatbot running all day, a document pipeline, and a video studio can look cheap on paper and expensive fast once output volume kicks in.

A practical rule of thumb: batch processing cuts costs by 50% on OpenAI, Anthropic, and Mistral for workloads that can tolerate a 24-hour turnaround.^[3] For video, drafting at lower resolution and upgrading only final renders is the most reliable way to control per-output spend.

Pros and Cons

The table below boils the tradeoffs down to the stuff that usually drives the decision: cost, modality, and workload fit. If you're choosing between providers, this gives you the short version without making you dig back through every pricing section.

Subject	Pros	Cons	Best For
APIMart	500+ models under one API; one invoice for text, image, and video	Usage-based pricing means costs rise with output volume	Teams that want unified multi-modal access
OpenAI	Clear token billing	Flagship models are expensive	General-purpose text workloads
Anthropic	Prompt caching lowers repeated-work costs	Top-tier models carry high output rates	Coding and long-context workflows
Google AI	Flash-Lite is cheap	Pro gets costly above 200K tokens	High-volume text and long-context workloads
Meta (Llama)	Low-cost if you can self-host	No first-party API means you handle hosting and uptime	Cost-sensitive workloads with self-hosting capability
xAI (Grok)	Competitive mid-tier pricing	Smaller model lineup	Real-time web and social-data applications
Mistral AI	Low-cost small models and multilingual coverage	Fewer multimodal features	Multilingual text apps
Cohere	Embed, Rerank, and Command R7B fit RAG	Command R+ is pricey for its tier	Retrieval-augmented generation and knowledge bases
Stability AI	Very low image-generation prices	Image-only scope limits broader workflows	High-volume image generation
Kling AI	Low-cost short-form video	Limited to 15-second videos at base pricing	Short-form video generation

A simple way to read this:

If you want one API for many model types, APIMart stands out.
If you care most about plain text usage and straightforward billing, OpenAI or Google AI may be the easier fit.
If your work leans toward coding, long prompts, or repeated context, Anthropic can make sense.
If you're keeping costs down and can run things yourself, Meta (Llama) is hard to ignore.
If your stack is built around RAG, Cohere has tools that line up well with that setup.

For image-heavy use, Stability AI is the low-cost pick. For short video clips, Kling AI keeps entry costs down, though the base plan stays tied to 15-second outputs.

Conclusion

Looking at the pricing breakdown above, the best model isn’t the most expensive one or the cheapest one. It’s the one that fits your workload, modality, and volume.

High-volume, low-complexity tasks should run on the lowest-cost models you can get away with.

As complexity goes up, spend should go up only when the output earns it. Mid-tier models are a good fit for production apps that need steady performance without the top-shelf price tag.

Once you get into premium reasoning or media generation, cost per output starts to matter more than raw token pricing. Premium models make sense when quality has a direct effect on results. And for video, pricing works differently: APIs like WAN 2.7, Sora 2 ($0.08/sec) and Kling V3 ($0.0672/sec at 720p) charge by the second, not by the token.

For teams using text, image, and video models together, APIMart gives access to 500+ models through a single API. That means multimodal work can sit under one API and one invoice.

FAQs

How do I estimate total cost per output?

Estimate the total cost based on how the model is billed.

For text models, pricing is usually split into input tokens and output tokens per 1 million tokens. Output tokens often cost more, so your expected response length has the biggest effect on total spend.

For non-text use cases, image models are often priced per call, while video models are priced per second generated.

A simple way to estimate cost is to:

use a token counter to measure prompt volume
check the model’s rate for each billing unit
apply that rate to your expected usage

That gives you a practical cost estimate before you scale anything up.

When does prompt caching save money?

Prompt caching cuts costs when your app sends the same prompt prefix again and again. That usually means long system instructions, large document sets, or shared conversation history reused across many requests.

Instead of paying the full input token price every time, you pay less for the repeated part. In many cases, that can reduce input costs by 50% to 90%.

This works best when volume is high and the context stays mostly the same. A customer support chatbot is a good example: the bot may reuse the same rules, brand info, and help docs across thousands of chats.

It’s a poor fit when the context changes all the time. If your app keeps rewriting the prompt from scratch on each request, there’s less repeated text to cache, so the savings drop fast.

Should I use subscriptions or API pricing?

For most developers and businesses, API pricing makes more sense. With pay-as-you-go billing, you pay for the tokens you use - no monthly minimums, no surprise fees, and no fixed charges hanging over you when traffic is light. Your costs move with usage, which is often a much better fit than a flat recurring bill.

APIMart gives you one API that connects to 500+ AI models, with clear per-token pricing and automatic volume discounts as your usage goes up.

Choosing the Right AI API for Your Next Project

Ready to build?

Choose the model you want in the model marketplace

Try chat, image and video models in the APIMart model marketplace, and experience model capabilities quickly with one unified API.

Chat modelsImage modelsVideo models

Explore model marketplace

LLM Pricing Guide — Compare 500+ AI Models

What are LLM Tokens and API Prices? (Beginner Friendly)

Quick Comparison

1. APIMart

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

2. OpenAI

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

3. Anthropic

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

4. Google AI

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

5. Meta

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

6. xAI

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

7. Mistral AI

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

8. Cohere

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

9. Runway

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

10. Stability AI

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

11. Black Forest Labs

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

12. Kling AI

Unit Pricing

Included Limits

Model Coverage

Cost-to-Output

Pricing Breakdown by Comparison Criteria

Unit Pricing Across Text, Image, and Video APIs

Included Limits in Subscriptions and Platform Plans

Model Coverage by Modality

Cost-to-Output by Common Use Case

Pros and Cons

Conclusion

FAQs

How do I estimate total cost per output?

When does prompt caching save money?

Should I use subscriptions or API pricing?

Related Blog Posts

Choose the model you want in the model marketplace