Chinese Models Desk
Chinese Models Desk

Xiaomi's MiMo-V2.5-Pro and MiMo Code: The Consumer Giant That Quietly Built a Frontier AI Stack

Xiaomi — better known for smartphones and smart home devices — has shipped a 1.02-trillion-parameter open-weight model and a full agentic coding harness that outperforms Claude Code on SWE-bench Pro, all under an MIT license and at a fraction of Western API prices. Here is why the global developer community should be paying attention.

ShareWhatsAppXFacebook

# Xiaomi's MiMo-V2.5-Pro and MiMo Code: The Consumer Giant That Quietly Built a Frontier AI Stack

When most observers scan China's AI landscape, they look to the usual suspects: DeepSeek for its price-war aggression, Alibaba's Qwen team for open-weight breadth, Zhipu AI (now Z.ai) for coding specialists. Xiaomi — the company that made its name selling affordable smartphones and rice cookers — rarely appears in that conversation. That oversight is becoming harder to justify.

Over the past two months, Xiaomi has shipped two interconnected releases that together constitute a serious, end-to-end agentic AI stack: MiMo-V2.5-Pro, a 1.02-trillion-parameter open-weight Mixture-of-Experts model released in April 2026, and MiMo Code V0.1.0, an open-source terminal-native coding agent released on June 10, 2026. Both carry MIT licenses. Both are priced aggressively. And together, they tell a story about how China's AI ambitions have quietly expanded beyond the dedicated AI labs into the consumer electronics giants that already have the hardware distribution, the developer communities, and the capital to compete at the frontier.

The Model: A Trillion Parameters, MIT-Licensed, and Cheaper Than You Think

MiMo-V2.5-Pro is not a research curiosity. The official model page describes a production-grade system built for long-horizon agentic tasks — the kind of sustained, multi-step autonomous work that most models still struggle with. The architecture is a Mixture-of-Experts design with 1.02 trillion total parameters and 42 billion active parameters per forward pass, giving it the reasoning depth of a much larger dense model while keeping inference costs manageable.

The technical choices are deliberate and worth unpacking:

  • Hybrid attention architecture: MiMo-V2.5-Pro interleaves Sliding Window Attention (SWA) and Global Attention (GA) at a 6:1 ratio, dramatically reducing KV-cache storage requirements for long-context tasks — a critical engineering decision for a model targeting 1M-token contexts.
  • Multi-Token Prediction (MTP): Three layers of dense FFN-based MTP modules triple output throughput during inference and accelerate reinforcement learning rollouts, according to the Hugging Face model card.
  • Training scale: Pre-trained on 27 trillion tokens using FP8 mixed precision, with post-training that combines supervised fine-tuning, domain-specialized reinforcement learning, and Multi-Teacher On-Policy Distillation (MOPD) to unify math, safety, and agentic tool-use capabilities.
  • Context window: A native 1-million-token context window, extended from a 32K native sequence length.

On benchmarks, the model posts numbers that would have been considered frontier-class twelve months ago. According to Artificial Analysis and Xiaomi's own technical documentation:

  • SWE-bench Pro: 57.2% — competitive with the best coding-specialist models available
  • SWE-bench Verified: 78.9%
  • Terminal-Bench 2.0: 68.4%
  • MiMo Coding Bench: 73.7%
"The model is specifically trained to maintain coherence over thousands of sequential tool calls, demonstrating 'harness awareness' in autonomous agent environments," states the official technical documentation. Xiaomi reports that MiMo-V2.5-Pro has autonomously built a full video editor and a compiler written in Rust — tasks that require sustained planning across hundreds of interdependent steps.

These are self-reported figures, and independent verification on some benchmarks is still pending. But the numbers are consistent with what third-party aggregators like OpenRouter and community testing on r/LocalLLaMA have observed: a model that punches well above its price point.

Pricing That Reframes the Competitive Landscape

The pricing structure is where Xiaomi's strategy becomes most legible. As of July 2026, following the deprecation of the older V2 series on June 30:

  • MiMo-V2.5-Pro: $0.435/M input tokens, $0.870/M output tokens (cache-miss); cached inputs drop to just $0.0036/M tokens
  • MiMo-V2.5 (the smaller sibling): $0.140/M input, $0.280/M output
  • MiMo-V2.5-Pro-UltraSpeed (FP4-quantized, up to 1,000 tokens/s): $1.305/M input, $2.610/M output

For context, The Decoder's coverage notes that MiMo-V2.5-Pro achieves comparable results to Claude Opus 4.6 and GPT-5.4 on agentic coding tasks while consuming 40–60% fewer tokens — meaning the effective cost advantage is larger than the per-token price differential alone suggests.

This combination of high token efficiency and low per-token cost creates a compelling economic proposition for developers building AI-powered services at scale. At $0.435/M input tokens, MiMo-V2.5-Pro costs roughly one-third of what comparable Western frontier models charge for similar agentic workloads.

The Agent: MiMo Code and the "AI Amnesia" Problem

The model alone would be notable. What makes Xiaomi's play genuinely interesting is the MiMo Code harness released alongside it.

MiMo Code V0.1.0, released June 10, 2026, is an open-source, terminal-native agentic coding assistant built as a fork of the OpenCode project. Its central design problem is one that every developer who has used AI coding tools has encountered: AI amnesia — the degradation of context and coherence in long-running sessions as the context window fills up.

The Persistent Memory Architecture

MiMo Code's solution is a four-layer persistent memory system built on SQLite FTS5 full-text search:

  • Session memory: Tracks the current task state and recent tool calls
  • Project memory: Stores architectural decisions, file structures, and conventions specific to the codebase
  • Global memory: Retains cross-project preferences and developer-specific patterns
  • Task progress checkpoints: An independent "checkpoint-writer" subagent saves state at 20%, 45%, and 70% of the context budget, ensuring high-quality state preservation before context exhaustion

The MiMo Code documentation describes a "Compose" mode that manages the full development lifecycle — planning, design, coding, testing, and review — from a single goal prompt. A "Dynamic Workflow" mechanism converts natural language instructions into deterministic JavaScript executed in an isolated sandbox, enabling parallel subagent tasks and reliable recovery from interruptions.

Benchmark Claims and Independent Scrutiny

Xiaomi reports that MiMo Code, paired with MiMo-V2.5-Pro, achieves:

  • SWE-bench Pro: 62% (vs. 57% for Claude Code paired with Claude Sonnet 4.6)
  • Terminal Bench 2: 73% (vs. 68% for Claude Code)

Xiaomi attributes approximately five percentage points of these gains to the agent harness itself rather than the underlying model — a meaningful claim about the value of the memory architecture independent of raw model capability. In internal double-blind A/B testing involving 576 developers, MiMo Code demonstrated a win rate exceeding 65% against Claude Code for tasks involving more than 200 execution steps.

As VentureBeat noted, these figures are self-reported and have not been independently verified. TechTimes flagged the same caveat. The community reception on GitHub has been positive — the repository accumulated significant stars within days of release — but independent replication of the benchmark numbers is still in progress.

Where This Fits in China's Open-Weight Strategy

Xiaomi's entry into frontier AI is not an isolated event. It reflects a broader structural shift in China's AI ecosystem that has been building since the original DeepSeek moment in early 2025.

The first phase of China's AI surge was dominated by dedicated AI labs — DeepSeek, Zhipu AI, Moonshot AI — competing on raw model performance and open-weight releases. The second phase, now underway, is characterized by the entry of large consumer and enterprise technology companies that bring different advantages: massive existing user bases, hardware distribution networks, and the capital to sustain long-term ecosystem plays.

Xiaomi fits this pattern precisely. The company already ships AI features across its smartphone and smart home product lines, giving it a direct channel to deploy MiMo capabilities at consumer scale. The QbitAI coverage of the MiMo-V2.5-Pro release (in Chinese) noted that Xiaomi's AI team has been quietly scaling its research capacity since 2024, with the MiMo series representing the public face of a much larger internal investment.

The MIT license on both the model weights and the MiMo Code harness is a deliberate strategic choice. Unlike earlier Chinese open-weight releases that carried restrictive commercial licenses or required separate approval for enterprise use, MIT removes all friction. Any developer, anywhere, can build on MiMo-V2.5-Pro without legal review. This is the same playbook that made Qwen and DeepSeek dominant in the global open-source community — and Xiaomi is executing it with a model that is genuinely competitive at the frontier.

The Hardware Independence Dimension

One element of the MiMo story that has received less attention in Western coverage is the hardware compatibility picture. According to MarkTechPost's coverage of the April release, MiMo-V2.5-Pro is supported by both SGLang and vLLM inference engines, and Xiaomi has confirmed compatibility with Huawei Ascend NPUs alongside NVIDIA GPUs.

This matters for the same reason it matters for DeepSeek-V4 and Qwen3.7-Max: Chinese enterprises deploying frontier AI at scale need confidence that their infrastructure is not dependent on hardware that could be subject to export controls. A trillion-parameter model that runs efficiently on domestic Ascend silicon is not just a technical achievement — it is a strategic asset for the broader Chinese AI ecosystem.

Practical Takeaways for Developers

For developers evaluating MiMo-V2.5-Pro and MiMo Code, the practical picture is as follows:

  • Model access: Weights are available on Hugging Face under MIT license. The API is available via Xiaomi's developer platform and through third-party providers including OpenRouter, AtlasCloud, and Novita.
  • MiMo Code installation: Available via `npm` or `curl`-pipe-to-bash from the GitHub repository. The tool supports bring-your-own-model (BYOM) configurations, allowing the harness to be pointed at locally hosted models via Ollama or any OpenAI-compatible endpoint — which also addresses data sovereignty concerns for organizations subject to strict compliance frameworks.
  • Pricing: At $0.435/M input tokens for MiMo-V2.5-Pro, the model is priced comparably to DeepSeek-V4-Pro and significantly below Western frontier models for equivalent agentic workloads. The UltraSpeed variant at $1.305/M input offers up to 1,000 tokens/s for latency-sensitive applications.
  • Data considerations: The default "MiMo Auto" free tier routes code through Xiaomi's cloud infrastructure, which is subject to Chinese law. Organizations with strict data residency requirements should use the BYOM configuration.

The Tosea AI guide provides a practical walkthrough of deployment options for teams evaluating the model for production use.

The Bigger Picture

Xiaomi's MiMo stack is a reminder that China's AI ecosystem is wider and deeper than the handful of dedicated AI labs that dominate Western coverage. The consumer electronics giant has shipped a trillion-parameter open-weight model, a production-grade agentic coding harness, and a pricing structure that makes both accessible to any developer with an API key — all within a two-month window.

The benchmark numbers need independent verification. The MiMo Code harness is at v0.1.0, which means rough edges are expected. But the strategic intent is clear: Xiaomi is not building a research project. It is building a developer platform, and it has the distribution, the capital, and now the models to make that platform matter.

For the global developer community, the practical question is simple: at $0.435/M input tokens, with MIT-licensed weights available for local deployment, and a coding harness that claims to outperform Claude Code on long-horizon tasks — is MiMo-V2.5-Pro worth evaluating? The answer, increasingly, is yes.

#Xiaomi#MiMo#MiMo-V2.5-Pro#MiMo Code#Open-Weight#China AI#Agentic AI#Coding Models#MoE#MIT License
Wei Lian
Wei Lian

🇨🇳 China Desk Lead · Beijing, China

Reads the Mandarin sources first — DeepSeek, Qwen, Zhipu, and the rest.

Comments

Open discussion — no account needed. Be respectful.

0/4000
Loading comments…