AI media model pricing (image · audio · video)
Pricing for AI image, audio (TTS) and video models — charged in native units such as per second, per megapixel or per 1K characters, sourced from each provider. Kept separate from our $/1M-token text-model catalog.
35 media models tracked
Image · 3
Juggernaut-Lightning-Flux
RunDiffusion
$0.0017/megapixel
Juggernaut-pro-flux
RunDiffusion
$0.0049/megapixel
ideogram-3.0
Ideogram
$0.06/megapixel
Audio / Speech · 32
inworld-stt-1
Inworld AI
$0.0001/sec
tts-v4-turbo
Resemble AI
$0.0005/sec
tts-v1
Resemble AI
$0.0005/sec
tts-v4-multilingual
Resemble AI
$0.0005/sec
tts-v4
Resemble AI
$0.0005/sec
tts-v3
Resemble AI
$0.0005/sec
tts-v2
Resemble AI
$0.0005/sec
sts-legacy
Resemble AI
$0.0005/sec
sts-v1
Resemble AI
$0.0005/sec
sts-v2
Resemble AI
$0.0005/sec
tts-legacy
Resemble AI
$0.0005/sec
whisper-large-v3-turbo
Groq
$0.0007/min
speech-to-text
Resemble AI
$0.001/sec
whisper-large-v3
Groq
$0.0019/min
gpt-4o-mini-transcribe
OpenAI
$0.003/min
gpt-4o-transcribe-diarize
OpenAI
$0.006/min
whisper-1
OpenAI
$0.006/min
gpt-4o-transcribe
OpenAI
$0.006/min
pixverse-sound-effect
PixVerse
$0.01/sec
gpt-realtime-whisper
OpenAI
$0.017/min
gpt-realtime-translate
OpenAI
$0.034/min
eleven_v3
ElevenLabs
$0.1/1K chars
scribe_v2
ElevenLabs
$0.22/hr
scribe_v2_realtime
ElevenLabs
$0.39/hr
Zonos-v0.1-transformer
Zyphra
$7/1M chars
Zonos-v0.1-hybrid
Zyphra
$7/1M chars
tts-1
OpenAI
$15/1M chars
inworld-tts-1.5-mini
Inworld AI
$25/1M chars
tts-1-hd
OpenAI
$30/1M chars
inworld-tts-2
Inworld AI
$35/1M chars
inworld-tts-1.5-max
Inworld AI
$35/1M chars
step-tts-2
StepFun
$38.8889/1M chars
Explore
Native-unit prices sourced from provider catalogs. Some media models (e.g. AWS Bedrock, Azure OpenAI) price by region/deployment and are not listed. Not financial advice.