DeepSeekMath‑V2: Open‑Source Math LLM That Reaches IMO Gold
Updated on November 30, 2025 14 minutes read
Launching Soon: On-Demand, Self-Paced Courses. Learn more!
Updated on November 30, 2025 14 minutes read
DeepSeekMath‑V2 is a 685B‑parameter large language model specialized in competition‑level mathematics and natural‑language theorem proving. It’s built on the DeepSeek‑V3.2‑Exp‑Base architecture and trained with reinforcement learning to both generate proofs and verify them, reaching gold‑level performance on IMO 2025 and CMO 2024, plus 118/120 on Putnam 2024.
Yes. The model weights and code are released under the Apache‑2.0 license on GitHub and Hugging Face: • GitHub repo: https://github.com/deepseek-ai/DeepSeek-Math-V2/tree/main • Hugging Face model: https://huggingface.co/deepseek-ai/DeepSeek-Math-V2
This allows researchers and companies to download, run, and fine‑tune the model, subject to the license terms.
In DeepSeekMath‑V2, “self‑verifiable” means the model is trained not only to write proofs, but also to: 1. Critique its own reasoning step by step. 2. Score its own proofs for correctness and rigor. 3. Refine proofs until it can no longer find issues, guided by a learned verifier and meta‑verifier. 
Instead of being rewarded just for a correct final answer, the model is rewarded for producing proofs that pass strict internal verification.
Most math LLMs are optimized to get the final answer right. DeepSeekMath‑V2 is optimized to get the entire proof right: • It uses a dedicated proof verifier trained on expert‑scored proofs. • A meta‑verifier checks that the verifier’s critiques are themselves accurate. • The proof generator is trained to honestly evaluate its own solutions, not just guess. 
This leads to more rigorous, competition‑grade proofs instead of brittle answer‑only reasoning.
In principle yes, but it’s a very large model (685B parameters), so you need serious compute (multi‑GPU clusters or specialized inference infrastructure). DeepSeek recommends using their experimental V3.2 serving stack for long‑context reasoning: • DeepSeek‑V3.2‑Exp serving repo: https://github.com/deepseek-ai/DeepSeek-V3.2-Exp 
For most teams, it’s more practical to: • Use quantized or distilled variants if they appear in the ecosystem, or • Access models via hosted providers that support DeepSeek weights.
According to the paper, DeepSeekMath‑V2:  • Scores gold‑level on IMO 2025 and CMO 2024. • Scores 118/120 on Putnam 2024, beating the top human score of 90. • Achieves 99.0 % on IMO‑ProofBench Basic and 61.9 % on Advanced under heavy‑compute settings. • Outperforms or matches frontier proprietary models on a wide range of CNML‑level and IMO‑level theorem‑proving tasks.
DeepSeekMath‑V2 works in natural language, not fully formal proof languages. It produces human‑style proofs that can be read, critiqued, and used to guide formal systems such as Lean or Isabelle. The DeepSeek team also builds formal provers (e.g., DeepSeek‑Prover‑V2) that can take advantage of strong informal reasoning as a high‑level guide for formal proof search.
Who should care about DeepSeekMath‑V2? • Math Olympiad students & coaches looking for AI‑generated solution sketches and proof critiques. • Researchers working on automated theorem proving and math‑capable AI. • Tool builders & startups building math tutors, research assistants, or theorem‑proving copilots. • AI engineers interested in self‑verification, meta‑reasoning, and RL‑trained reasoning models.
If you’re interested in getting hands‑on with LLMs and applied AI, you can also explore training paths and learning resources at Code Labs Academy.