Skip to content

GAIA: a benchmark for General AI Assistants

https://huggingface.co/papers/2311.12983

https://arxiv.org/abs/2311.12983

Meta

GAIA:通用AI助手基准

We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins. This notable performance disparity contrasts with the recent trend of LLMs outperforming humans on tasks requiring professional skills in e.g. law or chemistry. GAIA's philosophy departs from the current trend in AI benchmarks suggesting to target tasks that are ever more difficult for humans. We posit that the advent of Artificial General Intelligence (AGI) hinges on a system's capability to exhibit similar robustness as the average human does on such questions. Using GAIA's methodology, we devise 466 questions and their answer. We release our questions while retaining answers to 300 of them to power a leader-board hereby accessible.

我们推出了 GAIA,一个通用 AI 助手基准,若能解决该基准,将代表着 AI 研究的一个里程碑。GAIA 提出了需要一系列基本能力的现实世界问题,例如推理、多模态处理、网络浏览以及通用的工具使用熟练度。GAIA 的问题对人类而言概念上简单,但对大多数先进 AI 来说却极具挑战:我们展示了人类受访者获得 92% 的准确率,而配备插件的 GPT-4 仅获得 15%。这种显著的性能差距与最近大语言模型在法律或化学等需要专业技能的领域上超越人类的表现趋势形成对比。GAIA 的理念背离了当前 AI 基准中旨在针对对人类来说越来越困难的任务的趋势。我们认为,通用人工智能的到来取决于系统在回答此类问题时展现出与普通人相当的鲁棒性的能力。利用 GAIA 的方法,我们设计了 466 个问题及其答案。我们发布了我们的问题,同时保留了其中 300 个问题的答案,以支持一个由此可访问的排行榜。


LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

https://huggingface.co/papers/2311.05556

https://arxiv.org/abs/2311.05556

https://github.com/luosiallen/latent-consistency-model


LCM-LoRA:一个通用的Stable-Diffusion加速模块

Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ~32 A100 GPU training hours. This report further extends LCMs' potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5, SSD-1B, and SDXL, we have expanded LCM's scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs without training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM, DPM-Solver, LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities.

潜在一致性模型在加速文本到图像生成任务方面取得了令人瞩目的性能,能够以极少的推理步骤生成高质量图像。LCMs 是从预训练的潜在扩散模型中蒸馏而来,仅需约 32 个 A100 GPU 训练小时。本报告进一步从两个方面扩展了 LCMs 的潜力:首先,通过对包括 SD-V1.5、SSD-1B 和 SDXL 在内的 Stable-Diffusion 模型应用 LoRA 蒸馏,我们将 LCM 的范围扩展到更大规模的模型,内存消耗显著减少,并实现了卓越的图像生成质量。其次,我们将通过 LCM 蒸馏获得的 LoRA 参数确定为一个通用的 Stable-Diffusion 加速模块,命名为 LCM-LoRA。LCM-LoRA 可以直接插入到各种经过微调的 Stable-Diffusion 模型或 LoRAs 中而无需训练,因此代表了适用于多样化图像生成任务的通用加速器。与先前诸如 DDIM、DPM-Solver 等数值型 PF-ODE 求解器相比,LCM-LoRA 可被视为一个具备强大泛化能力的插件式神经 PF-ODE 求解器。


GPT4All: An Ecosystem of Open Source Compressed Language Models

https://huggingface.co/papers/2311.04931

https://arxiv.org/abs/2311.04931

https://github.com/nomic-ai/gpt4all


GPT4All:一个开源压缩语言模型生态系统

Large language models (LLMs) have recently achieved human-level performance on a range of professional and academic benchmarks. The accessibility of these models has lagged behind their performance. State-of-the-art LLMs require costly infrastructure; are only accessible via rate-limited, geo-locked, and censored web interfaces; and lack publicly available code and technical reports. In this paper, we tell the story of GPT4All, a popular open source repository that aims to democratize access to LLMs. We outline the technical details of the original GPT4All model family, as well as the evolution of the GPT4All project from a single model into a fully fledged open source ecosystem. It is our hope that this paper acts as both a technical overview of the original GPT4All models as well as a case study on the subsequent growth of the GPT4All open source ecosystem.

大语言模型近期在一系列专业和学术基准上达到了人类水平的性能。然而,这些模型的可访问性却落后于其性能。最先进的大语言模型需要昂贵的基础设施;只能通过速率受限、地理封锁且经过审查的网页接口访问;并且缺乏公开可用的代码和技术报告。在本文中,我们讲述了 GPT4All 的故事,这是一个旨在使大语言模型访问民主化的流行开源存储库。我们概述了原始 GPT4All 模型系列的技术细节,以及 GPT4All 项目从一个单一模型演变为一个成熟开源生态系统的发展历程。我们希望本文既能作为原始 GPT4All 模型的技术概述,也能作为 GPT4All 开源生态系统后续发展的案例研究。