Skip to content
PRICES
GPT-4.1-MINI$1.601048K
GPT-4.1-NANO$0.401048K
GPT-4.1-MINI-2$1.601048K
DEEPSEEK$0.281049K
GEMINI-3.1-FLA$1.501049K
MINIMAX-M2.1$0.951000K
MINIMAX-M2.5$0.991000K
GEMINI-2.5-FLA$0.401049K
MINIMAX-M2$1.001000K
GEMINI-2.0-FLA$0.301049K
GEMINI-2.5-FLA$0.401049K
GEMINI-2.0-FLA$0.401049K
DEEPSEEK-V4-FL$0.281049K
MINIMAX-M3$1.20Elo 1528
QWEN3.5-FLASH-$0.261000K
GROK-4-1-FAST-$0.502000K
QWEN-PLUS-2025$0.781000K
MINIMAX-01$1.101000K
LLAMA-4-MAVERI$0.601049K
GROK-4.1-FAST$0.502000K
GROK-4-FAST$0.502000K
GROK-4-1-FAST-$0.502000K
GROK-4-FAST-RE$0.502000K
GROK-4-FAST-NO$0.502000K
MINIMAX-M2.7$1.201000K
QWEN3-CODER-FL$0.981000K
QWEN-PLUS-2025$0.781000K
QWEN-PLUS$0.781000K
GPT-4.1-MINI$1.601048K
GPT-4.1-NANO$0.401048K
GPT-4.1-MINI-2$1.601048K
DEEPSEEK$0.281049K
GEMINI-3.1-FLA$1.501049K
MINIMAX-M2.1$0.951000K
MINIMAX-M2.5$0.991000K
GEMINI-2.5-FLA$0.401049K
MINIMAX-M2$1.001000K
GEMINI-2.0-FLA$0.301049K
GEMINI-2.5-FLA$0.401049K
GEMINI-2.0-FLA$0.401049K
DEEPSEEK-V4-FL$0.281049K
MINIMAX-M3$1.20Elo 1528
QWEN3.5-FLASH-$0.261000K
GROK-4-1-FAST-$0.502000K
QWEN-PLUS-2025$0.781000K
MINIMAX-01$1.101000K
LLAMA-4-MAVERI$0.601049K
GROK-4.1-FAST$0.502000K
GROK-4-FAST$0.502000K
GROK-4-1-FAST-$0.502000K
GROK-4-FAST-RE$0.502000K
GROK-4-FAST-NO$0.502000K
MINIMAX-M2.7$1.201000K
QWEN3-CODER-FL$0.981000K
QWEN-PLUS-2025$0.781000K
QWEN-PLUS$0.781000K

AI media model pricing (image · audio · video)

Pricing for AI image, audio (TTS) and video models — charged in native units such as per second, per megapixel or per 1K characters, sourced from each provider. Kept separate from our $/1M-token text-model catalog.

35 media models tracked

Image · 3

Juggernaut-Lightning-Flux
RunDiffusion
$0.0017/megapixel
Juggernaut-pro-flux
RunDiffusion
$0.0049/megapixel
ideogram-3.0
Ideogram
$0.06/megapixel

Audio / Speech · 32

inworld-stt-1
Inworld AI
$0.0001/sec
tts-v4-turbo
Resemble AI
$0.0005/sec
tts-v1
Resemble AI
$0.0005/sec
tts-v4-multilingual
Resemble AI
$0.0005/sec
tts-v4
Resemble AI
$0.0005/sec
tts-v3
Resemble AI
$0.0005/sec
tts-v2
Resemble AI
$0.0005/sec
sts-legacy
Resemble AI
$0.0005/sec
sts-v1
Resemble AI
$0.0005/sec
sts-v2
Resemble AI
$0.0005/sec
tts-legacy
Resemble AI
$0.0005/sec
whisper-large-v3-turbo
Groq
$0.0007/min
speech-to-text
Resemble AI
$0.001/sec
whisper-large-v3
Groq
$0.0019/min
gpt-4o-mini-transcribe
OpenAI
$0.003/min
gpt-4o-transcribe-diarize
OpenAI
$0.006/min
whisper-1
OpenAI
$0.006/min
gpt-4o-transcribe
OpenAI
$0.006/min
pixverse-sound-effect
PixVerse
$0.01/sec
gpt-realtime-whisper
OpenAI
$0.017/min
gpt-realtime-translate
OpenAI
$0.034/min
eleven_v3
ElevenLabs
$0.1/1K chars
scribe_v2
ElevenLabs
$0.22/hr
scribe_v2_realtime
ElevenLabs
$0.39/hr
Zonos-v0.1-transformer
Zyphra
$7/1M chars
Zonos-v0.1-hybrid
Zyphra
$7/1M chars
tts-1
OpenAI
$15/1M chars
inworld-tts-1.5-mini
Inworld AI
$25/1M chars
tts-1-hd
OpenAI
$30/1M chars
inworld-tts-2
Inworld AI
$35/1M chars
inworld-tts-1.5-max
Inworld AI
$35/1M chars
step-tts-2
StepFun
$38.8889/1M chars

Explore

Native-unit prices sourced from provider catalogs. Some media models (e.g. AWS Bedrock, Azure OpenAI) price by region/deployment and are not listed. Not financial advice.