Skip to content

Llama 2: Open Foundation and Fine-Tuned Chat Models

https://huggingface.co/papers/2307.09288

https://arxiv.org/abs/2307.09288

https://github.com/meta-llama/llama

Meta

Llama 2:开放基础模型与微调聊天模型

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

在本工作中,我们开发并发布了 Llama 2,一系列预训练和微调的大语言模型,参数规模从 70 亿到 700 亿不等。我们称为 Llama 2-Chat 的微调大语言模型针对对话场景进行了优化。在我们测试的大多数基准上,我们的模型优于开源聊天模型,并且基于我们对有用性和安全性的人工评估,可能成为闭源模型的合适替代品。我们详细描述了微调方法和对 Llama 2-Chat 的安全改进,以使社区能够在我们的工作基础上进行构建,并为大语言模型的负责任发展做出贡献。


ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

https://huggingface.co/papers/2307.16789

https://arxiv.org/abs/2307.16789

https://github.com/openbmb/toolbench

清华大学

ToolLLM:助力大语言模型掌握超16000种真实世界API

Despite the advancements of open-source large language models (LLMs), e.g., LLaMA, they remain significantly limited in tool-use capabilities, i.e., using external tools (APIs) to fulfill human instructions. The reason is that current instruction tuning largely focuses on basic language tasks but ignores the tool-use domain. This is in contrast to the excellent tool-use capabilities of state-of-the-art (SOTA) closed-source LLMs, e.g., ChatGPT. To bridge this gap, we introduce ToolLLM, a general tool-use framework encompassing data construction, model training, and evaluation. We first present ToolBench, an instruction-tuning dataset for tool use, which is constructed automatically using ChatGPT. Specifically, the construction can be divided into three stages: (i) API collection: we collect 16,464 real-world RESTful APIs spanning 49 categories from RapidAPI Hub; (ii) instruction generation: we prompt ChatGPT to generate diverse instructions involving these APIs, covering both single-tool and multi-tool scenarios; (iii) solution path annotation: we use ChatGPT to search for a valid solution path (chain of API calls) for each instruction. To enhance the reasoning capabilities of LLMs, we develop a novel depth-first search-based decision tree algorithm. It enables LLMs to evaluate multiple reasoning traces and expand the search space. Moreover, to evaluate the tool-use capabilities of LLMs, we develop an automatic evaluator: ToolEval. Based on ToolBench, we fine-tune LLaMA to obtain an LLM ToolLLaMA, and equip it with a neural API retriever to recommend appropriate APIs for each instruction. Experiments show that ToolLLaMA demonstrates a remarkable ability to execute complex instructions and generalize to unseen APIs, and exhibits comparable performance to ChatGPT. Our ToolLLaMA also demonstrates strong zero-shot generalization ability in an out-of-distribution tool-use dataset: APIBench.

尽管开源大语言模型取得了进展,但它们在工具使用能力上仍显著受限,即使用外部工具来完成人类指令。原因是当前的指令调优主要集中在基础语言任务上,而忽略了工具使用领域。这与最先进的闭源大语言模型卓越的工具使用能力形成鲜明对比。为弥合这一差距,我们引入了 ToolLLM,一个涵盖数据构建、模型训练和评估的通用工具使用框架。我们首先提出了 ToolBench,一个用于工具使用的指令调优数据集,该数据集使用 ChatGPT 自动构建。具体来说,构建过程可分为三个阶段:API 收集:我们从 RapidAPI Hub 收集了跨越 49 个类别的 16,464 个真实世界 RESTful API;指令生成:我们提示 ChatGPT 生成涉及这些 API 的多样化指令,涵盖单工具和多工具场景;以及解决方案路径标注:我们使用 ChatGPT 为每条指令搜索有效的解决方案路径。为了增强大语言模型的推理能力,我们开发了一种新颖的基于深度优先搜索的决策树算法,使大语言模型能够评估多条推理轨迹并扩展搜索空间。此外,为了评估大语言模型的工具使用能力,我们开发了一个自动评估器:ToolEval。基于 ToolBench,我们对 LLaMA 进行微调,得到了一个 ToolLLaMA 模型,并为其配备了一个神经 API 检索器,以推荐适合每条指令的 API。实验表明,ToolLLaMA 在执行复杂指令和泛化到未见过的 API 方面表现出卓越的能力,并与 ChatGPT 性能相当。我们的 ToolLLaMA 还在一个分布外的工具使用数据集上展现了强大的零样本泛化能力。


SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

https://huggingface.co/papers/2307.01952

https://arxiv.org/abs/2307.01952

https://github.com/stability-ai/generative-models

Stability AI

SDXL:改进用于高分辨率图像合成的潜在扩散模型

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared to previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights.

我们提出了 SDXL,一个用于文本到图像合成的潜在扩散模型。与先前版本的 Stable Diffusion 相比,SDXL 采用了三倍大的 UNet 骨干网络:模型参数的增加主要源于更多的注意力块和更大的跨注意力上下文,因为 SDXL 使用了第二个文本编码器。我们设计了多种新颖的条件机制,并在多种宽高比上训练 SDXL。我们还引入了一个精炼模型,用于通过事后图像到图像技术提升 SDXL 生成样本的视觉保真度。我们证明,与先前版本的 Stable Diffusion 相比,SDXL 的性能有显著提升,并与黑盒最先进图像生成器竞争。本着促进开放研究和提高大型模型训练与评估透明度的精神,我们提供代码和模型权重的访问。