LongCat-2.0: Meituan's Trillion-Parameter Bet on Domestic Chips

Meituan — better known as China's food-delivery giant — has shipped LongCat-2.0, a 1.6-trillion-parameter MoE model trained entirely on domestic Chinese ASICs, with a 1M-token context window, MIT license, and API pricing that undercuts Western frontier models. It's the most consequential proof yet that China's AI stack is going end-to-end independent of Nvidia.

Sophia Chen🇨🇦 China Desk CorrespondentJul 3, 2026 11m read

# LongCat-2.0: Meituan's Trillion-Parameter Bet on Domestic Chips

Executive Summary

The most consequential Chinese AI-lab release of the week did not come from DeepSeek, Alibaba's Qwen, or Moonshot. It came from Meituan — the company most Westerners still think of as a food-delivery giant. On June 30, 2026, Meituan's LongCat team shipped LongCat-2.0, a 1.6-trillion-parameter Mixture-of-Experts (MoE) model with a 1-million-token context window, an MIT license, and API pricing that deliberately undercuts Western frontier models. You can inspect it now on its Hugging Face model card↗ and GitHub repository↗.

The benchmark story is strong but not the headline. LongCat-2.0 reportedly scores 59.5 on SWE-bench Pro, edging out GPT-5.5's 58.6, alongside 88.9 on GPQA-diamond and 70.8 on Terminal-Bench 2.1. The bigger story is strategic and physical: Reuters↗ reports Meituan trained the model entirely on a cluster of 50,000 domestic Chinese ASICs — no Nvidia — using Huawei's collective-communication library to coordinate the chips. This is the first trillion-parameter model claimed to be trained end-to-end on a domestic Chinese superpod.

For global developers and buyers, LongCat-2.0 is a shortlist candidate for agentic coding, long-context document work, and internal copilots — provided you understand the licensing nuance (weights were "coming soon" at launch even as the API went live). For anyone tracking AI supply chains, it is a proof point that China's model builders are now attempting to own the full stack, from silicon to inference.

Bottom line for decision-makers: LongCat-2.0 combines frontier-adjacent coding performance, a 1M-token context window, aggressive pricing, and domestic-chip training in one package. Evaluate it now via API for coding and agent workloads; treat the hardware-independence angle as the more durable strategic signal.

Key Findings

1. LongCat-2.0 is a 1.6T-parameter MoE model that activates only ~48B parameters per token. The sparse design (33B–56B active) buys frontier-scale capability without frontier-scale inference cost. *So what:* teams can plan for large-model quality at a compute footprint far smaller than a dense equivalent — making a real pilot financially plausible, not just a research curiosity.

2. Meituan claims LongCat-2.0 beats GPT-5.5 on SWE-bench Pro (59.5 vs 58.6). The margin is narrow, and the model reportedly still trails Claude Opus 4.8 on broader agentic and reasoning benchmarks. *So what:* treat it as a near-frontier coding specialist, not a universal GPT-5.5/Opus replacement — deploy it where coding and terminal tasks dominate.

3. The model was trained on 50,000 domestic Chinese ASICs, with no Nvidia hardware. According to Reuters and Meituan's launch note↗, pretraining spanned over 35 trillion tokens "without rollbacks or irrecoverable loss spikes." *So what:* China's domestic accelerator ecosystem is now mature enough for very large *training* runs, not just inference — reducing the West's assumed leverage over Chinese frontier development.

4. Pricing is aggressive, and context-cache hits are free. Standard rates are $0.75/M input and $2.95/M output tokens, with a promo tier of $0.30/$1.20. *So what:* for iterative agents that repeatedly re-read the same codebase, free cache reads can slash real-world bills — a structural cost advantage over metered frontier APIs.

5. The model ran anonymously as "Owl Alpha" on OpenRouter for ~2 months before the reveal. It reportedly handled over 10 trillion tokens monthly and ranked top-three globally by call volume. *So what:* this is battle-tested by real developer traffic, not just a polished demo — lowering the risk of a "benchmarks-only" disappointment in production.

6. Licensing is MIT, but weights were "coming soon" at launch even as the API went live. The repository has since been populated with files. *So what:* if your plan depends on self-hosting the weights today, verify availability before committing — "API-ready now, self-hostable later" is the safest working assumption.

Detailed Analysis

Theme 1: What LongCat-2.0 actually is — scale engineered for agents

LongCat-2.0 is best understood as an agentic coding and workflow model that happens to be enormous. The architecture pairs massive total capacity with disciplined sparsity, and it leans on a purpose-built attention scheme — LongCat Sparse Attention (LSA) — that the team says reduces computational complexity from quadratic to linear, enabling the native 1M-token window. An N-gram embedding module is used to sharpen local token relationships.

| Specification | LongCat-2.0 | LongCat-Flash (prior family) | |---|---|---| | Total parameters | 1.6 trillion | 560 billion | | Activated per token | ~48B (33B–56B) | ~27B average | | Context window | 1M tokens | Not specified in sources | | Primary focus | Agentic coding, workflows | Agentic tasks, reasoning | | License | MIT | MIT |

The 1M context window is the most immediately actionable feature. It supports repository-scale coding, long enterprise documents, and long-horizon agent loops without fragmenting the task. This is more useful to a working developer than any abstract "reasoning" claim, because it changes what you can put into a single prompt: an entire monorepo, a full product manual, or a sprawling internal wiki.

Meituan is clearly not shipping a one-off. Its GitHub org already spans LongCat-Flash-Chat↗, LongCat-Next↗ (a native multimodal model), LongCat-Flash-Thinking (reasoning), LongCat-Flash-Prover (Lean4 formal proving), plus LongCat-Video and LongCat-Image. This is a full model family sharing infrastructure and evaluation logic.

Bottom line for this theme: LongCat-2.0 is a coding-and-agent workhorse, not a chat toy — and it sits atop a broader, reusable model ecosystem that suggests sustained investment rather than a single splash.

Theme 2: Benchmark reality — near-frontier, with honest caveats

The performance claims are strong on coding and mixed on breadth. Below are the figures Meituan reports across its official channels and the launch note.

| Benchmark | LongCat-2.0 score | Note from sources | |---|---|---| | SWE-bench Pro | 59.5 | Reported to edge GPT-5.5 (58.6) | | Terminal-Bench 2.1 | 70.8 | Agentic/terminal tasks | | FORTE (workflow simulator) | 73.2 | Corporate workflow simulation | | GPQA-diamond | 88.9 | Reasoning/QA | | IFEval | 90.0 | Per project technical blog | | IMO-AnswerBench | 81.8 | Per project technical blog |

Two caveats matter. First, these are vendor-reported numbers; independent verification is still thin, and the Hugging Face discussion thread↗ notes that some benchmarks (IFEval, IMO-AnswerBench) could not be formally registered on the Hub's evaluation registry at documentation time. Second, sources indicate LongCat-2.0 trails premium frontier models like Claude Opus 4.8 on broader agentic and reasoning benchmarks. The SWE-bench Pro "win" over GPT-5.5 is real per Meituan, but it is a narrow margin, not a blowout.

The mitigating factor is the OpenRouter track record. Running as "Owl Alpha" for roughly two months, the model reportedly processed over 10 trillion tokens monthly and cracked the top three globally by call volume. Real developer usage is a more credible signal than any single leaderboard.

Bottom line for this theme: LongCat-2.0 is genuinely competitive on coding and strong on reasoning QA, but position it as a near-frontier specialist — verify the numbers on your own workloads before betting a roadmap on them.

Theme 3: The domestic-chip story — the real headline

If benchmarks interest developers, the hardware story should interest buyers and strategists more. Meituan describes LongCat-2.0 as the first trillion-parameter model trained and deployed end-to-end on a domestic ASIC superpod — a cluster of over 50,000 Chinese accelerators, coordinated via Huawei's Collective Communication Library (HCCL). The reported 35-trillion-token pretraining run completed without significant technical failures.

This matters in three concrete ways:

It reduces exposure to Nvidia supply constraints and Western export-control risk.
It gives Chinese enterprises a cleaner procurement story for data-sensitive workloads.
It signals the domestic accelerator ecosystem can now sustain very large *training* runs — historically the hardest part to localize.

For years, "China AI" headlines described models. LongCat-2.0 reframes the competition as one about *systems*: kernels, communication layers, inference stacks, and industrial training recipes. A frontier model trained on domestic silicon is a benchmark for the entire ecosystem, not just the lab.

Bottom line for this theme: This is a hardware story disguised as a model story. If Chinese labs can train and serve frontier systems on domestic accelerators, cloud choice and export resilience become negotiable — a structural shift worth monitoring regardless of which model wins on benchmarks.

Theme 4: Economics and access — priced to change default behavior

Meituan is behaving like a product company, not a pure research lab. The pricing is designed to shape developer defaults.

| Tier | Input ($/M tokens) | Output ($/M tokens) | Cache hits | |---|---|---|---| | Standard | 0.75 | 2.95 | Free | | Promotional | 0.30 | 1.20 | Free |

Free context-cache hits are the sleeper feature. Agents that repeatedly revise code, re-read files, or iterate over the same prompt state can accumulate enormous input costs on metered APIs. Making cache reads free directly targets that pain point — a structural, not cosmetic, advantage.

The licensing posture reinforces the openness narrative: LongCat-2.0 ships under the permissive MIT license↗, allowing commercial use, modification, and redistribution. The one operational wrinkle is timing — at launch, both GitHub and Hugging Face listed weights as "coming soon" while the API was live, though the repo has since been populated with files.

The practical decision this week is simple but important: do you need the model now, or the weights now? Those are not the same thing.

Bottom line for this theme: LongCat-2.0 attacks the three things developers complain about most — price, context limits, and lock-in — simultaneously. Even a slightly weaker model can win serious mindshare on that combination.

How to try it — a practical checklist

Read the official model card↗ and launch note↗ for intended use cases and performance claims.
Check the GitHub repository↗ for inference instructions and weight-availability updates.
Review the license↗ before any commercial deployment.
Inspect the HF discussion thread↗ and the VitaBench 2.0 dataset↗ to understand how the team frames long-horizon agent evaluation.

Recommendations

1. Engineering teams building coding agents or copilots should run a bake-off this quarter. Pit LongCat-2.0 (via API) against your incumbent on your own SWE-bench-style tasks and repository-scale prompts. Trigger: if your workload is code-heavy and cost-sensitive, start now — the promo pricing and free cache reads make the evaluation nearly free.

2. Cost-conscious buyers should model total cost with cache behavior included. Do not compare headline per-token rates alone; simulate an iterative agent loop where cache hits are free. Trigger: if input tokens dominate your bill (repeated file reads), LongCat-2.0's economics may be decisive.

3. Teams needing self-hosting or air-gapped deployment must confirm weight availability first. The MIT license is permissive, but "coming soon" at launch is a real gap. Trigger: do not commit a self-hosting roadmap until you have verified downloadable weights on GitHub/Hugging Face.

4. Strategy and procurement leaders should treat the domestic-chip claim as a supply-chain signal. Factor Chinese hardware-independent training into scenario planning around export controls and vendor concentration. Trigger: if your organization's AI strategy assumes Chinese labs are Nvidia-bottlenecked, revisit that assumption now.

5. Risk-averse adopters should keep a frontier fallback for broad reasoning. LongCat-2.0 reportedly trails Claude Opus 4.8 on broader agentic/reasoning tasks. Trigger: for general-purpose reasoning beyond coding, route to a stronger model or run a hybrid stack.

Caveats & Limitations

Most performance figures are vendor-reported. SWE-bench Pro, Terminal-Bench, FORTE, GPQA-diamond, IFEval, and IMO-AnswerBench scores come from Meituan's own materials. Independent third-party verification remains limited, and some benchmarks could not be formally registered on Hugging Face's evaluation registry at documentation time.

The GPT-5.5 comparison is narrow and one-dimensional. The reported 59.5 vs 58.6 edge is on a single coding benchmark. Sources explicitly note LongCat-2.0 trails premium models like Claude Opus 4.8 on broader tasks.

Weight availability is a moving target. At launch, weights were listed as "coming soon" even as the API went live; the repository has since been populated with files. Confirm current status directly before planning self-hosting.

The domestic-chip claim originates with Meituan. The 50,000-ASIC, 35-trillion-token, no-rollback training narrative is reported by Meituan and relayed by Reuters and VentureBeat; the specifics of the chips are described in sources as domestic Chinese ASICs coordinated via Huawei's HCCL.

Pricing tiers include a promotional rate. The $0.30/$1.20 promo is described as limited-time; standard pricing is $0.75/$2.95. Budget on standard rates for durable planning.

This report covers one release. Other Chinese labs (ByteDance's Seed 2.1, SenseTime's SenseNova line, Tencent Hunyuan, MiniMax, Xiaomi MiMo) showed activity in the broader window but no genuinely new July 2–3 launch matching LongCat-2.0's significance; they are noted only for context, not analyzed in depth here.

#Meituan#LongCat-2.0#China AI#MoE#Open-Weight#Domestic Chips#Agentic AI#Coding Models#1M Context#MIT License

Links & Resources

External links — opens in a new tab

Hugging Face model cardhuggingface.co

GitHub repositorygithub.com

Reutersreuters.com

launch notelongcatai.org

LongCat-Flash-Chatgithub.com

LongCat-Nextgithub.com

Hugging Face discussion threadhuggingface.co

MIT licensehuggingface.co

VitaBench 2.0 datasethuggingface.co

Sophia Chen

🇨🇦 China Desk Correspondent · Toronto, Canada

Bridges the East–West gap — what China’s models mean for everyone else.

Partial Differential Equations: Theory, Methods, and Applications

by Richard Murdoch Montgomery

A rigorous, modern treatment of the heat, wave and Laplace equations — the math that underpins the physics of computation.

Buy on Amazon →

Scientific Calculators: Treatises and Manuals

by Richard Murdoch Montgomery

The definitive 15-volume series bridging user manuals and applied mathematics — from the TI-Nspire CX II CAS to financial solvers.

Buy on Amazon →

Comments

Open discussion — no account needed. Be respectful.

Loading comments…

More from Chinese Models Desk

Huawei Open-Sources Pangu 2.0 Flash, Forging a Self-Reliant AI Ecosystem on Ascend Hardware

Huawei has open-sourced openPangu-2.0-Flash, a 92-billion-parameter MoE model with a 512K context window trained entirely on its proprietary Ascend NPUs — a landmark move that signals the emergence of a vertically integrated Chinese AI stack independent of NVIDIA hardware.

Wei Lian

Jul 3, 2026 11m

Z.ai's GLM-5.2 Is the Open-Weight Coding Model the World Didn't Know It Needed

Zhipu AI's rebranded Z.ai has released GLM-5.2, a 744-billion-parameter open-weight coding giant that outperforms GPT-5.5 on SWE-bench Pro and costs one-sixth the price — and every weight is free to download under an MIT license.

Sophia Chen

Jul 2, 2026 11m

Moonshot AI's Kimi K2.7 Code Lands in GitHub Copilot — The First Open-Weight Model in Microsoft's AI Roster

Moonshot AI's Kimi K2.7 Code became the first open-weight model to enter GitHub Copilot's model picker on July 1, 2026, completing a five-lab roster alongside OpenAI, Anthropic, Google, and Microsoft. The 1-trillion-parameter coding specialist, released June 12 under a Modified MIT license, brings 30% better token efficiency than its predecessor and aggressive $0.95/M input pricing to one of the world's largest developer platforms.

Wei Lian

Jul 2, 2026 10m