LIVE

DEEPSEEK-V4-FL$0.20▼ 26.3%

DEEPSEEK-V4-PR$0.87▲ 4.6%

QWEN3.6-FLASH$1.13▼ 24.9%

NEMOTRON-3-SUP$0.45▼ 23.9%

LLAMA-4-MAVERI$0.60▼ 11.9%

LLAMA-4-SCOUT$0.30▲ 5.8%

GEMINI-3.1-FLA$1.50▼ 10.0%

GEMINI-2.5-FLA$0.40▼ 0.8%

MINIMAX-01$1.10▲ 8.0%

MIMO-V2.5$0.28▼ 5.8%

MIMO-V2.5-PRO$0.87▲ 2.6%

MINIMAX-M3$1.20▼ 6.1%

QWEN3.5-PLUS-2$1.80▼ 6.0%

NOVA-2-LITE-V1$2.50▼ 31.4%

GEMINI-2.5-FLA$2.50▼ 9.5%

GROK-4.3$2.50▼ 8.7%

QWEN3.6-PLUS$1.95▼ 31.6%

NEMOTRON-3-ULT$2.50▼ 25.9%

QWEN3.7-PLUS$1.60▼ 21.4%

MINIMAX-M1$2.20▲ 11.9%

PALMYRA-X5$6.00▼ 26.9%

QWEN3.7-MAX$3.75▼ 3.1%

GEMINI-3.5-FLA$9.00▼ 8.3%

GEMINI-2.5-PRO$10.00▲ 7.2%

GPT-5.4-NANO$1.25▲ 10.0%

NOVA-LITE-V1$0.24▼ 28.9%

KIMI-K2.5$1.90▲ 9.8%

MINISTRAL-14B-$0.20▼ 17.7%

DEEPSEEK-V4-FL$0.20▼ 26.3%

DEEPSEEK-V4-PR$0.87▲ 4.6%

QWEN3.6-FLASH$1.13▼ 24.9%

NEMOTRON-3-SUP$0.45▼ 23.9%

LLAMA-4-MAVERI$0.60▼ 11.9%

LLAMA-4-SCOUT$0.30▲ 5.8%

GEMINI-3.1-FLA$1.50▼ 10.0%

GEMINI-2.5-FLA$0.40▼ 0.8%

MINIMAX-01$1.10▲ 8.0%

MIMO-V2.5$0.28▼ 5.8%

MIMO-V2.5-PRO$0.87▲ 2.6%

MINIMAX-M3$1.20▼ 6.1%

QWEN3.5-PLUS-2$1.80▼ 6.0%

NOVA-2-LITE-V1$2.50▼ 31.4%

GEMINI-2.5-FLA$2.50▼ 9.5%

GROK-4.3$2.50▼ 8.7%

QWEN3.6-PLUS$1.95▼ 31.6%

NEMOTRON-3-ULT$2.50▼ 25.9%

QWEN3.7-PLUS$1.60▼ 21.4%

MINIMAX-M1$2.20▲ 11.9%

PALMYRA-X5$6.00▼ 26.9%

QWEN3.7-MAX$3.75▼ 3.1%

GEMINI-3.5-FLA$9.00▼ 8.3%

GEMINI-2.5-PRO$10.00▲ 7.2%

GPT-5.4-NANO$1.25▲ 10.0%

NOVA-LITE-V1$0.24▼ 28.9%

KIMI-K2.5$1.90▲ 9.8%

MINISTRAL-14B-$0.20▼ 17.7%

Pricing· HotON Desk· Jun 1, 2026· 5 days ago· 1 min read

Batch and cached-prompt discounts widen the gap to real-time pricing

Deeper discounts for batched and cached workloads are reshaping cost planning, rewarding teams that can tolerate latency or reuse context.

Why it matters

Token prices set the floor on every AI product's margins. When a provider moves pricing, it ripples across competitors, routing choices and the cost of every downstream feature.

Explore the data behind this

Related HotON.ai pages

Indexes →Models →

Summaries are aggregated for information only — follow the source link for the full story. Demo entries are illustrative.

More news

Model Launches10 hours ago

Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory

Pricing10 hours ago

Google will pay SpaceX $920M per month for compute

Funding & M&A10 hours ago

S&P 500 rejects SpaceX, also blocking entry for OpenAI and Anthropic

Infrastructure11 hours ago

"We pissed off a lot of people": Giant data center plan cut 50% amid protests