Depth Anything V2
Depth Anything V2
This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, compared with V1, this version produces much finer and more robust depth predictions through three key practices: 1) replacing all labeled real images with synthetic images, 2) scaling up the capacity of our teacher model, and 3) teaching student models via the bridge of large-scale pseudo-labeled real images. Compared with the latest models built on Stable Diffusion, our models are significantly more efficient (more than
本工作提出了 Depth Anything V2。在不追求花哨技术的前提下,我们旨在揭示关键发现,为构建强大的单目深度估计模型铺平道路。值得注意的是,与 V1 相比,本版本通过三项关键实践,生成了更精细、更鲁棒的深度预测:用合成图像替换所有标记的真实图像;扩展我们教师模型的容量;以及通过大规模伪标记真实图像的桥梁来教导学生模型。与基于 Stable Diffusion 构建的最新模型相比,我们的模型效率显著更高,并且更准确。我们提供了不同规模的模型,以支持广泛的应用场景。得益于其强大的泛化能力,我们使用度量深度标签对其进行微调,以获得我们的度量深度模型。除我们的模型外,考虑到当前测试集多样性有限且经常存在噪声,我们构建了一个具有精确标注和多样场景的多功能评估基准,以促进未来的研究。
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Microsoft
指令预训练:语言模型是有监督的多任务学习者
Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds significant promise, as scaling it in the post-training stage trends towards better generalization. In this paper, we explore supervised multitask pre-training by proposing Instruction Pre-Training, a framework that scalably augments massive raw corpora with instruction-response pairs to pre-train LMs. The instruction-response pairs are generated by an efficient instruction synthesizer built on open-source models. In our experiments, we synthesize 200M instruction-response pairs covering 40+ task categories to verify the effectiveness of Instruction Pre-Training. In pre-training from scratch, Instruction Pre-Training not only consistently enhances pre-trained base models but also benefits more from further instruction tuning. In continual pre-training, Instruction Pre-Training enables Llama3-8B to be comparable to or even outperform Llama3-70B.
无监督多任务预训练是近期语言模型成功背后的关键方法。然而,有监督多任务学习仍具有巨大潜力,因为在后训练阶段扩展它往往能带来更好的泛化能力。在本文中,我们通过提出指令预训练来探索有监督多任务预训练,这是一个通过指令-响应对大规模增强原始语料库以预训练语言模型的框架。指令-响应对由基于开源模型构建的高效指令合成器生成。在我们的实验中,我们合成了涵盖超过 40 个任务类别的 2 亿个指令-响应对,以验证指令预训练的有效性。在从头开始的预训练中,指令预训练不仅持续增强了预训练基础模型,还使其从进一步的指令微调中获益更多。在持续预训练中,指令预训练使 Llama3-8B 能够与 Llama3-70B 相媲美甚至超越。
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
清华大学
ChatGLM:从 GLM-130B 到 GLM-4 All Tools 的大语言模型系列
We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. Evaluations show that GLM-4, 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3) matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide when and which tool(s) to use—including web browser, Python interpreter, text-to-image model, and user-defined functions—to effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter. Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M), GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging Face in the year 2023 alone.
我们推出了 ChatGLM,这是一个不断演进的大语言模型系列。本报告主要聚焦于 GLM-4 语言系列,包括 GLM-4、GLM-4-Air 和 GLM-4-9B。它们代表了当前能力最强的模型,融入了前三代 ChatGLM 开发过程中获得的所有见解和经验。迄今为止,GLM-4 模型已在十万亿词元的语料上进行了预训练,语料以中文和英文为主,并包含少量来自 24 种语言的语料,且主要针对中文和英文的使用进行了对齐。高质量的对齐是通过一个多阶段的训练后流程实现的,该流程包括监督微调和从人类反馈中学习。评估表明,GLM-4 在诸如 MMLU、GSM8K、MATH、BBH、GPQA 和 HumanEval 等通用指标上,与 GPT-4 水平相当甚至更优;在 IFEval 衡量的指令跟随能力上接近 GPT-4-Turbo;在长上下文任务上与 GPT-4 Turbo 和 Claude 3 相当;并在 AlignBench 衡量指标上,在中文对齐方面优于 GPT-4。GLM-4 All Tools 模型经过进一步对齐,能够理解用户意图,并自主决定何时以及使用哪种工具,以有效完成复杂任务。在实际应用中,它在通过网页浏览获取在线信息和使用 Python 解释器解决数学问题等任务上,与 GPT-4 All Tools 相当甚至更胜一筹。在此期间,我们开源了一系列模型,包括 ChatGLM-6B、GLM-4-9B、GLM-4V-9B、WebGLM 和 CodeGeeX,仅 2023 年一年就在 Hugging Face 上吸引了超过 1000 万次下载。