On April 24, 2026, Chinese artificial intelligence startup DeepSeek announced the preview release of its V4 model family, introducing significant technical advancements in reasoning and autonomous agentic capabilities. The launch includes two primary versions: the flagship DeepSeek-V4-Pro and the efficiency-focused DeepSeek-V4-Flash. Both models are released under an open-source MIT license, with weights made available on public repositories including Hugging Face.
The DeepSeek-V4-Pro model utilizes a Mixture-of-Experts architecture featuring 1.6 trillion total parameters, with 49 billion parameters activated per inference. The V4-Flash variant contains 284 billion total parameters, activating 13 billion. A central technical highlight of the V4 series is its native support for a 1-million-token context window, a substantial increase from the 128,000-token limit of the previous V3 generation. To manage the computational demands of such a large context, DeepSeek introduced a hybrid attention mechanism combining Compressed Sparse Attention and Heavily Compressed Attention, which reportedly reduces KV-cache memory usage by 90 percent compared to prior models.
A critical development in the V4 launch is its deep integration with domestic Chinese hardware. DeepSeek confirmed that the V4 series was optimized for day zero compatibility with Huawei's Ascend AI chip platform, including the Ascend 950 series and Ascend supernode clusters. Huawei issued a concurrent statement confirming that its full product line now supports V4 inference workloads. This transition represents a strategic pivot for DeepSeek, which had previously relied heavily on Nvidia hardware for its V3 and R1 models. The company noted that the use of custom kernels and the Muon software module allowed for high efficiency on domestic silicon, with V4-Pro requiring only 27 percent of the single-token inference FLOPs compared to the V3.2 iteration.
In terms of performance, DeepSeek claims that V4-Pro-Max—the model's highest reasoning mode—outperforms OpenAI's GPT-5.2 and Google's Gemini 3.0-Pro on standard reasoning and mathematical benchmarks. The company specifically highlighted the model's agentic intelligence, which enables it to execute multi-step workflows, interact with external tools, and perform complex coding tasks autonomously. Internal evaluations placed V4-Pro on par with Anthropic's Claude Opus 4.5 in agentic task completion.
The release coincided with a period of heightened geopolitical tension, arriving shortly after new allegations from U.S. officials regarding industrial-scale intellectual property distillation. Despite these challenges, DeepSeek has positioned V4 as a cost-effective alternative for global developers, with API pricing for the Flash version set at approximately 0.14 dollars per one million input tokens. The company also introduced a cache-hit pricing model of 0.028 dollars per million tokens to facilitate long-running agentic applications.