Depth Anything V2

https://huggingface.co/papers/2406.09414

https://arxiv.org/abs/2406.09414

https://github.com/DepthAnything/Depth-Anything-V2

Depth Anything V2

This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, compared with V1, this version produces much finer and more robust depth predictions through three key practices: 1) replacing all labeled real images with synthetic images, 2) scaling up the capacity of our teacher model, and 3) teaching student models via the bridge of large-scale pseudo-labeled real images. Compared with the latest models built on Stable Diffusion, our models are significantly more efficient (more than $10 \times$ faster) and more accurate. We offer models of different scales (ranging from 25M to 1.3B params) to support extensive scenarios. Benefiting from their strong generalization capability, we fine-tune them with metric depth labels to obtain our metric depth models. In addition to our models, considering the limited diversity and frequent noise in current test sets, we construct a versatile evaluation benchmark with precise annotations and diverse scenes to facilitate future research.

本工作提出了 Depth Anything V2。在不追求花哨技术的前提下，我们旨在揭示关键发现，为构建强大的单目深度估计模型铺平道路。值得注意的是，与 V1 相比，本版本通过三项关键实践，生成了更精细、更鲁棒的深度预测：用合成图像替换所有标记的真实图像；扩展我们教师模型的容量；以及通过大规模伪标记真实图像的桥梁来教导学生模型。与基于 Stable Diffusion 构建的最新模型相比，我们的模型效率显著更高，并且更准确。我们提供了不同规模的模型，以支持广泛的应用场景。得益于其强大的泛化能力，我们使用度量深度标签对其进行微调，以获得我们的度量深度模型。除我们的模型外，考虑到当前测试集多样性有限且经常存在噪声，我们构建了一个具有精确标注和多样场景的多功能评估基准，以促进未来的研究。

Instruction Pre-Training: Language Models are Supervised Multitask Learners

https://huggingface.co/papers/2406.14491

https://arxiv.org/abs/2406.14491

https://github.com/microsoft/LMOps

Microsoft

指令预训练：语言模型是有监督的多任务学习者

Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds significant promise, as scaling it in the post-training stage trends towards better generalization. In this paper, we explore supervised multitask pre-training by proposing Instruction Pre-Training, a framework that scalably augments massive raw corpora with instruction-response pairs to pre-train LMs. The instruction-response pairs are generated by an efficient instruction synthesizer built on open-source models. In our experiments, we synthesize 200M instruction-response pairs covering 40+ task categories to verify the effectiveness of Instruction Pre-Training. In pre-training from scratch, Instruction Pre-Training not only consistently enhances pre-trained base models but also benefits more from further instruction tuning. In continual pre-training, Instruction Pre-Training enables Llama3-8B to be comparable to or even outperform Llama3-70B.

无监督多任务预训练是近期语言模型成功背后的关键方法。然而，有监督多任务学习仍具有巨大潜力，因为在后训练阶段扩展它往往能带来更好的泛化能力。在本文中，我们通过提出指令预训练来探索有监督多任务预训练，这是一个通过指令-响应对大规模增强原始语料库以预训练语言模型的框架。指令-响应对由基于开源模型构建的高效指令合成器生成。在我们的实验中，我们合成了涵盖超过 40 个任务类别的 2 亿个指令-响应对，以验证指令预训练的有效性。在从头开始的预训练中，指令预训练不仅持续增强了预训练基础模型，还使其从进一步的指令微调中获益更多。在持续预训练中，指令预训练使 Llama3-8B 能够与 Llama3-70B 相媲美甚至超越。

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

https://huggingface.co/papers/2406.12793

https://arxiv.org/abs/2406.12793

https://github.com/zai-org/ChatGLM-6B

清华大学

ChatGLM：从 GLM-130B 到 GLM-4 All Tools 的大语言模型系列

We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. Evaluations show that GLM-4, 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3) matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide when and which tool(s) to use—including web browser, Python interpreter, text-to-image model, and user-defined functions—to effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter. Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M), GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging Face in the year 2023 alone.

我们推出了 ChatGLM，这是一个不断演进的大语言模型系列。本报告主要聚焦于 GLM-4 语言系列，包括 GLM-4、GLM-4-Air 和 GLM-4-9B。它们代表了当前能力最强的模型，融入了前三代 ChatGLM 开发过程中获得的所有见解和经验。迄今为止，GLM-4 模型已在十万亿词元的语料上进行了预训练，语料以中文和英文为主，并包含少量来自 24 种语言的语料，且主要针对中文和英文的使用进行了对齐。高质量的对齐是通过一个多阶段的训练后流程实现的，该流程包括监督微调和从人类反馈中学习。评估表明，GLM-4 在诸如 MMLU、GSM8K、MATH、BBH、GPQA 和 HumanEval 等通用指标上，与 GPT-4 水平相当甚至更优；在 IFEval 衡量的指令跟随能力上接近 GPT-4-Turbo；在长上下文任务上与 GPT-4 Turbo 和 Claude 3 相当；并在 AlignBench 衡量指标上，在中文对齐方面优于 GPT-4。GLM-4 All Tools 模型经过进一步对齐，能够理解用户意图，并自主决定何时以及使用哪种工具，以有效完成复杂任务。在实际应用中，它在通过网页浏览获取在线信息和使用 Python 解释器解决数学问题等任务上，与 GPT-4 All Tools 相当甚至更胜一筹。在此期间，我们开源了一系列模型，包括 ChatGLM-6B、GLM-4-9B、GLM-4V-9B、WebGLM 和 CodeGeeX，仅 2023 年一年就在 Hugging Face 上吸引了超过 1000 万次下载。

综合类

Memory

⚛️ Next.js

📈 Seo

⚛️ React.js

🎨 css

📊 d3.js

🌿 Node.js

🌱 koa.js

🥘 GAMES101

🌌 three.js

🫧 WebGPU

高等数学

🧰 工具安装

🤖 Rasa

🥝 机器学习

🧠 LLM专题

🍿 强化学习

🍳 计算机视觉

🤖 智能体

🐬 mysql

🧪 jest

Depth Anything V2