Skip to content

mHC: Manifold-Constrained Hyper-Connections

https://huggingface.co/papers/2512.24880

https://arxiv.org/abs/2512.24880

DeepSeek

mHC:流形约束超连接

Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.

近期,以超连接网络(HC)为代表的研究通过扩展残差流的宽度并丰富连接模式,发展了近十年来广泛应用的残差连接范式。尽管这种方法带来了显著的性能提升,但其多样化的连接方式从根本上削弱了残差连接固有的恒等映射特性,导致严重的训练不稳定性与可扩展性受限,同时产生了显著的内存访问开销。为解决这些问题,我们提出流形约束超连接网络(mHC)——一个通用框架,通过将HC的残差连接空间投影至特定流形以恢复恒等映射特性,并结合严格的基础设施优化以确保效率。实验证明,mHC能有效支持大规模训练,在提升性能的同时展现出卓越的可扩展性。我们预期,mHC作为HC框架的灵活实用拓展,将深化对拓扑架构设计的理解,并为基础模型的演进提供值得关注的研究方向。


DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

https://huggingface.co/papers/2512.02556

https://arxiv.org/abs/2512.02556

DeepSeek

DeepSeek-V3.2:推动开源大语言模型的前沿

We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios. (2) Scalable Reinforcement Learning Framework: By implementing a robust reinforcement learning protocol and scaling post-training compute, DeepSeek-V3.2 performs comparably to GPT-5. Notably, our high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and exhibits reasoning proficiency on par with Gemini-3.0-Pro, achieving gold-medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). (3) Large-Scale Agentic Task Synthesis Pipeline: To integrate reasoning into tool-use scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This methodology facilitates scalable agentic post-training, yielding substantial improvements in generalization and instruction-following robustness within complex, interactive environments.

我们推出了 DeepSeek-V3.2,一个将高计算效率与卓越推理和智能体性能相协调的模型。DeepSeek-V3.2 的关键技术突破如下:(1) DeepSeek 稀疏注意力: 我们引入了 DSA,一种高效的注意力机制,能够在长上下文场景中显著降低计算复杂度,同时保持模型性能。(2) 可扩展的强化学习框架: 通过实施强大的强化学习协议并扩展后训练计算量,DeepSeek-V3.2 的性能可与 GPT-5 相媲美。值得注意的是,我们的高计算变体 DeepSeek-V3.2-Speciale 超越了 GPT-5,展现出与 Gemini-3.0-Pro 相当的推理熟练度,在 2025 年国际数学奥林匹克竞赛和国际信息学奥林匹克竞赛中均取得了金牌级别的成绩。(3) 大规模智能体任务合成管道: 为了将推理整合到工具使用场景中,我们开发了一种新颖的合成管道,能够系统地大规模生成训练数据。该方法促进了可扩展的智能体后训练,在复杂、交互式环境中实现了泛化能力和指令遵循鲁棒性的显著提升。


DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

https://huggingface.co/papers/2512.16676

https://arxiv.org/abs/2512.16676

https://github.com/OpenDCAI/DataFlow

北京大学

DataFlow:面向数据中心AI时代的、由大语言模型驱动的统一数据准备与工作流自动化框架

The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, current practices remain dominated by ad-hoc scripts and loosely specified workflows, which lack principled abstractions, hinder reproducibility, and offer limited support for model-in-the-loop data generation. To address these challenges, we present DataFlow, a unified and extensible LLM-driven data preparation framework. DataFlow is designed with system-level abstractions that enable modular, reusable, and composable data transformations, and provides a PyTorch-style pipeline construction API for building debuggable and optimizable dataflows. The framework consists of nearly 200 reusable operators and six domain-general pipelines spanning text, mathematical reasoning, code, Text-to-SQL, agentic RAG, and large-scale knowledge extraction. To further improve usability, we introduce DataFlow-Agent, which automatically translates natural-language specifications into executable pipelines via operator synthesis, pipeline planning, and iterative verification. Across six representative use cases, DataFlow consistently improves downstream LLM performance. Our math, code, and text pipelines outperform curated human datasets and specialized synthetic baselines, achieving up to +3% execution accuracy in Text-to-SQL over SynSQL, +7% average improvements on code benchmarks, and 1–3 point gains on MATH, GSM8K, and AIME. Moreover, a unified 10K-sample dataset produced by DataFlow enables base models to surpass counterparts trained on 1M Infinity-Instruct data. These results demonstrate that DataFlow provides a practical and high-performance substrate for reliable, reproducible, and scalable LLM data preparation, and establishes a system-level foundation for future data-centric AI development.

大语言模型对高质量数据的快速增长需求,加剧了对可扩展、可靠且语义丰富的数据准备流程的需求。然而,当前实践仍由临时脚本和松散指定的工作流主导,这些方法缺乏原则性抽象,阻碍了可复现性,并且对模型参与的数据生成支持有限。为应对这些挑战,我们提出了 DataFlow,一个统一且可扩展的、由大语言模型驱动的数据准备框架。DataFlow 通过系统级抽象设计,支持模块化、可复用和可组合的数据转换,并提供类似 PyTorch 风格的流程构建 API,用于构建可调试和可优化的数据流。该框架包含近 200 个可复用运算符和六个通用流程,涵盖文本、数学推理、代码、Text-to-SQL、智能体检索增强生成以及大规模知识抽取。为进一步提升可用性,我们引入了 DataFlow-Agent,它通过运算符合成、流程规划和迭代验证,自动将自然语言规范转化为可执行流程。在六个代表性用例中,DataFlow 持续提升了下游大语言模型的性能。我们的数学、代码和文本流程优于精心整理的人工数据集和专门的合成基线,在 Text-to-SQL 任务上相较 SynSQL 实现了高达 +3% 的执行准确率提升,在代码基准上平均提升 +7%,并在 MATH、GSM8K 和 AIME 上获得 1–3 分的增益。此外,由 DataFlow 生成的一个统一的 10K 样本数据集,使基础模型能够超越在 1M Infinity-Instruct 数据上训练的同类模型。这些结果表明,DataFlow 为可靠、可复现且可扩展的大语言模型数据准备提供了一个实用且高性能的基础,并为未来以数据为中心的 AI 发展奠定了系统级基础。