DeepSeek V4 Deep Dive: DSA Sparse Attention, 1M Context, vs GPT-5/Claude/Gemini
The V4-Pro and V4-Flash released on 2026-04-24: 1.6T MoE, 49B activated parameters, $3.48/M output tokens. A full breakdown of the new DSA sparse attention architecture, LiveCodeBench 93.5 / SWE-bench 80.6 / Codeforces 3206 benchmarks, pricing strategy, Huawei Ascend integration, and engineering selection guide.