Infrastructure· MarkTechPost· Jun 9, 2026· 2 months ago· 1 min read

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

Xiaomi's MiMo team, with TileRT, released MiMo-V2.5-Pro-UltraSpeed, a serving mode for the MiMo-V2.5-Pro model. It decodes over 1000 tokens per second on a 1-trillion-parameter model using a single 8-GPU commodity node.…

Why it matters

Compute supply, energy and data-center capacity decide how cheaply AI can run. Infrastructure shifts show up in inference costs weeks later.

Explore on HotON

Companies and models mentioned in this story — open their pages and live prices

MIMO-V2.5Xiaomi$0.40 / $2.00in / out · $ per 1M tokens MIMO-V2.5-PROXiaomi$1.00 / $3.00in / out · $ per 1M tokens

Xiaomi →

Explore the data behind this

Related HotON.ai pages

Regions →Indexes →

More news

News →

Infrastructure6 hours ago

Chip stocks tumble as AI sell-off deepens

Infrastructure9 hours ago

Data centers may face temporary power cuts to prevent blackouts on largest US grid

Infrastructure11 hours ago

Taiwan detains Nvidia employee in widening China chip smuggling probe

Infrastructure11 hours ago

You've been using your power bank wrong, and airline rules make that obvious

Read original (MarkTechPost) →

Summaries are aggregated for information only — follow the source link for the full story. Demo entries are illustrative.