KAN: Kolmogorov-Arnold Networks
KAN:科尔莫戈罗夫-阿诺德网络
Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes (“neurons”), KANs have learnable activation functions on edges (“weights”). KANs have no linear weights at all – every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability, on small-scale AI + Science tasks. For accuracy, smaller KANs can achieve comparable or better accuracy than larger MLPs in function fitting tasks. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful “collaborators” helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today’s deep learning models which rely heavily on MLPs.
受科尔莫戈罗夫-阿诺德表示定理的启发,我们提出了科尔莫戈罗夫-阿诺德网络,作为多层感知机有前途的替代方案。多层感知机在节点上具有固定的激活函数,而科尔莫戈罗夫-阿诺德网络在边上具有可学习的激活函数。科尔莫戈罗夫-阿诺德网络完全没有线性权重——每个权重参数都被一个参数化为样条的单变量函数所取代。我们表明,这一看似简单的改变,使得科尔莫戈罗夫-阿诺德网络在小型AI+科学任务上的准确性和可解释性方面优于多层感知机。在准确性方面,较小的科尔莫戈罗夫-阿诺德网络在函数拟合任务中可以达到与较大多层感知机相当甚至更好的准确度。从理论和实证上看,科尔莫戈罗夫-阿诺德网络比多层感知机具有更快的神经扩展律。在可解释性方面,科尔莫戈罗夫-阿诺德网络可以被直观地可视化,并能轻松地与人类用户交互。通过数学和物理学中的两个例子,我们展示了科尔莫戈罗夫-阿诺德网络是帮助科学家发现数学和物理定律的有用“合作者”。总之,科尔莫戈罗夫-阿诺德网络是多层感知机有前途的替代方案,为进一步改进当今严重依赖多层感知机的深度学习模型开辟了机会。
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Large Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values, further raising the upper bound of AI capabilities, particularly in reasoning-intensive, long-context Chain-of-Thought (CoT) tasks. However, existing frameworks commonly face challenges such as inference bottlenecks and complexity barriers, which restrict their accessibility to newcomers. To bridge this gap, we introduce OpenRLHF, a user-friendly, scalable, and easy-to-learn open-source RLHF framework built upon Ray, vLLM, DeepSpeed, and HuggingFace Transformers, featuring a simplified design, clear code structure, and comprehensive documentation to facilitate entry for researchers and practitioners. Experimental results show that OpenRLHF achieves superior training efficiency, with speedups ranging from 1.22× to 1.68× across different model sizes, compared to state-of-the-art frameworks. Additionally, it requires significantly fewer lines of code for implementation.
通过基于人类反馈的强化学习和基于可验证奖励的强化学习进行微调的大语言模型,显著改善了人类与AI价值观的对齐,进一步提升了AI能力的天花板,特别是在推理密集型、长上下文的思维链任务中。然而,现有框架普遍面临推理瓶颈和复杂性障碍等挑战,限制了新手的可及性。为弥合这一差距,我们推出了 OpenRLHF,一个用户友好、可扩展且易于学习的开源RLHF框架,它基于 Ray、vLLM、DeepSpeed 和 HuggingFace Transformers 构建,具有简化的设计、清晰的代码结构和全面的文档,以方便研究人员和实践者入门。实验结果表明,与最先进的框架相比,OpenRLHF 实现了卓越的训练效率,在不同模型规模下的加速比介于 1.22 倍到 1.68 倍之间。此外,它的实现所需的代码行数显著更少。
Extending Llama-3's Context Ten-Fold Overnight
一夜之间将Llama-3的上下文扩展十倍
We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHs, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original capability over short contexts. The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4, which indicates the LLMs’ inherent (yet largely underestimated) potential to extend its original context length. In fact, the context length could be extended far beyond 80K with more computation resources.
我们通过 QLoRA 微调将 Llama-3-8B-Instruct 的上下文长度从 8K 扩展到 80K。整个训练周期非常高效,在一台 8 卡 A800 的 GPU 机器上只需 8 小时。由此产生的模型在广泛的评估任务中展现出卓越的性能,例如 NIHs、主题检索和长上下文语言理解;同时,它也很好地保留了在短上下文上的原始能力。这种显著的上下文扩展主要归功于仅由 GPT-4 生成的 3.5K 合成训练样本,这表明大语言模型具有扩展其原始上下文长度的内在潜力。事实上,通过更多的计算资源,上下文长度可以远远扩展到 80K 以上。