Blog — AI Model Deep Dives & Weekly Paper Highlights

2026-04-25 W17 Deep Dive Model Analysis

DeepSeek V4 Deep Dive: DSA Sparse Attention, 1M Context, vs GPT-5/Claude/Gemini

The V4-Pro and V4-Flash released on 2026-04-24: 1.6T MoE, 49B activated parameters, $3.48/M output tokens. A full breakdown of the new DSA sparse attention architecture, LiveCodeBench 93.5 / SWE-bench 80.6 / Codeforces 3206 benchmarks, pricing strategy, Huawei Ascend integration, and engineering selection guide.

DeepSeek V4 DSA Sparse Attention 1M Context MoE Open-Source LLM Huawei Ascend

NGJOO AI Blog

DeepSeek V4 Deep Dive: DSA Sparse Attention, 1M Context, vs GPT-5/Claude/Gemini