Xiaomi's MiMo team, with TileRT, released MiMo-V2.5-Pro-UltraSpeed, a serving mode for the MiMo-V2.5-Pro model. It decodes over 1000 tokens per second on a 1-trillion-parameter model using a single 8-GPU commodity node.…
Compute supply, energy and data-center capacity decide how cheaply AI can run. Infrastructure shifts show up in inference costs weeks later.
Summaries are aggregated for information only — follow the source link for the full story. Demo entries are illustrative.