Skip to content


面向知识密集型NLP任务的检索增强生成

Abstract

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) — models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, and another which can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state of the art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.

大型预训练语言模型已被证明能够在其参数中存储事实知识,并在下游NLP任务上进行微调时取得最先进的结果。然而,它们访问和精确操纵知识的能力仍然有限,因此在知识密集型任务上,其性能落后于任务特定的架构。此外,为其决策提供出处以及更新其世界知识仍然是开放的研究问题。迄今为止,具有可微分访问机制来访问显式非参数化记忆的预训练模型仅被研究用于抽取式下游任务。我们探索了一种用于检索增强生成(RAG)的通用微调方法——这类模型结合了预训练的参数化记忆和非参数化记忆进行语言生成。我们引入了RAG模型,其中参数化记忆是一个预训练的seq2seq模型,非参数化记忆是一个稠密的维基百科向量索引,通过一个预训练的神经检索器进行访问。我们比较了两种RAG公式:一种在整个生成序列中基于相同的检索段落,另一种则可以在每个词元使用不同的段落。我们在广泛的知识密集型NLP任务上微调和评估我们的模型,并在三个开放域问答任务上取得了最先进的结果,超越了参数化的seq2seq模型和任务特定的检索-抽取架构。对于语言生成任务,我们发现RAG模型比最先进的纯参数化seq2seq基线生成更具体、更多样化和更符合事实的语言。

Introduction

Pre-trained neural language models have been shown to learn a substantial amount of in-depth knowledge from data. They can do so without any access to an external memory, as a parameterized implicit knowledge base. While this development is exciting, such models do have downsides: They cannot easily expand or revise their memory, can't straightforwardly provide insight into their predictions, and may produce "hallucinations". Hybrid models that combine parametric memory with non-parametric (i.e., retrieval-based) memories can address some of these issues because knowledge can be directly revised and expanded, and accessed knowledge can be inspected and interpreted. REALM and ORQA, two recently introduced models that combine masked language models with a differentiable retriever, have shown promising results, but have only explored open-domain extractive question answering. Here, we bring hybrid parametric and non-parametric memory to the "workhorse of NLP," i.e. sequence-to-sequence (seq2seq) models.

预训练的神经语言模型已被证明可以从数据中学习大量深入的知识。它们可以在无需任何外部记忆的情况下做到这一点,充当参数化的隐式知识库。尽管这一发展令人兴奋,但此类模型确实存在缺点:它们不能轻易扩展或修正其记忆,无法直接提供对其预测的洞察,并且可能产生“幻觉”。结合参数化记忆与非参数化(即基于检索的)记忆的混合模型可以解决其中一些问题,因为知识可以被直接修正和扩展,并且访问到的知识可以被检查和解释。最近引入的两个模型REALM和ORQA将掩码语言模型与可微分检索器相结合,已显示出有希望的结果,但仅探索了开放域抽取式问答。在这里,我们将混合参数化和非参数化记忆引入到“NLP的主力军”,即序列到序列模型中。

We endow pre-trained, parametric-memory generation models with a non-parametric memory through a general-purpose fine-tuning approach which we refer to as retrieval-augmented generation (RAG). We build RAG models where the parametric memory is a pre-trained seq2seq transformer, and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We combine these components in a probabilistic model trained end-to-end (Fig. 1). The retriever (Dense Passage Retriever, henceforth DPR) provides latent documents conditioned on the input, and the seq2seq model (BART) then conditions on these latent documents together with the input to generate the output. We marginalize the latent documents with a top-K approximation, either on a per-output basis (assuming the same document is responsible for all tokens) or a per-token basis (where different documents are responsible for different tokens). Like T5 or BART, RAG can be fine-tuned on any seq2seq task, whereby both the generator and retriever are jointly learned.

我们通过一种通用的微调方法(称为检索增强生成,RAG)为预训练的参数化记忆生成模型配备了非参数化记忆。我们构建的RAG模型中,参数化记忆是一个预训练的seq2seq Transformer,非参数化记忆是一个稠密的维基百科向量索引,通过预训练的神经检索器进行访问。我们在一个端到端训练的概率模型中组合了这些组件(图1)。检索器(密集段落检索器,以下简称DPR)根据输入提供潜在文档,然后seq2seq模型(BART)根据这些潜在文档以及输入来生成输出。我们通过top-K近似对潜在文档进行边缘化,可以是基于每个输出(假设同一个文档负责所有词元),也可以是基于每个词元(不同的文档负责不同的词元)。与T5或BART类似,RAG可以在任何seq2seq任务上进行微调,同时学习生成器和检索器。

There has been extensive previous work proposing architectures to enrich systems with non-parametric memory which are trained from scratch for specific tasks, e.g. memory networks, stack-augmented networks and memory layers. In contrast, we explore a setting where both parametric and non-parametric memory components are pre-trained and pre-loaded with extensive knowledge. Crucially, by using pre-trained access mechanisms, the ability to access knowledge is present without additional training.

已有大量先前的工作提出了用非参数化记忆丰富系统的架构,这些架构是针对特定任务从头开始训练的,例如记忆网络、堆栈增强网络和记忆层。相比之下,我们探索了一种设定,其中参数化和非参数化记忆组件都是预训练的,并预加载了大量知识。关键在于,通过使用预训练的访问机制,无需额外训练即可具备访问知识的能力。

Our results highlight the benefits of combining parametric and non-parametric memory with generation for knowledge-intensive tasks—tasks that humans could not reasonably be expected to perform without access to an external knowledge source. Our RAG models achieve state-of-the-art results on open Natural Questions, WebQuestions and CuratedTrec and strongly outperform recent approaches that use specialised pre-training objectives on TriviaQA. Despite these being extractive tasks, we find that unconstrained generation outperforms previous extractive approaches. For knowledge-intensive generation, we experiment with MS-MARCO and Jeopardy question generation, and we find that our models generate responses that are more factual, specific, and diverse than a BART baseline. For FEVER fact verification, we achieve results within 4.3% of state-of-the-art pipeline models which use strong retrieval supervision. Finally, we demonstrate that the non-parametric memory can be replaced to update the models' knowledge as the world changes.

我们的结果凸显了将参数化和非参数化记忆与生成相结合对于知识密集型任务的好处——这些任务人类如果没有外部知识源是无法合理完成的。我们的RAG模型在开放域Natural Questions、WebQuestions和CuratedTrec上取得了最先进的结果,并在TriviaQA上显著优于使用专门预训练目标的近期方法。尽管这些是抽取式任务,但我们发现无约束生成优于先前的抽取式方法。对于知识密集型生成,我们在MS-MARCO和Jeopardy问题生成上进行了实验,发现我们的模型生成的响应比BART基线更符合事实、更具体且更多样化。在FEVER事实验证上,我们的结果与使用强检索监督的最先进流水线模型相差在4.3%以内。最后,我们证明了非参数化记忆可以被替换,以便随着世界变化更新模型的知识。

Methods

We explore RAG models, which use the input sequence x to retrieve text documents z and use them as additional context when generating the target sequence y. As shown in Figure 1, our models leverage two components: (i) a retriever pη(z|x) with parameters η that returns (top-K truncated) distributions over text passages given a query x and (ii) a generator pθ(yi|x,z,y1:i1) parametrized by θ that generates a current token based on a context of the previous i1 tokens y1:i1, the original input x and a retrieved passage z.

我们探索了RAG模型,该模型使用输入序列 x 来检索文本文档 z,并在生成目标序列 y 时将其作为额外的上下文。如图1所示,我们的模型利用了两个组件:(i) 一个检索器 pη(z|x),参数为 η,在给定查询 x 时返回文本段落的(top-K截断)分布;(ii) 一个生成器 pθ(yi|x,z,y1:i1),参数为 θ,基于前 i1 个词元的上下文 y1:i1、原始输入 x 和一个检索到的段落 z 来生成当前词元。

To train the retriever and generator end-to-end, we treat the retrieved document as a latent variable. We propose two models that marginalize over the latent documents in different ways to produce a distribution over generated text. In one approach, RAG-Sequence, the model uses the same document to predict each target token. The second approach, RAG-Token, can predict each target token based on a different document. In the following, we formally introduce both models and then describe the pη and pθ components, as well as the training and decoding procedure.

为了端到端地训练检索器和生成器,我们将检索到的文档视为潜在变量。我们提出了两种模型,以不同方式对潜在文档进行边缘化,以产生生成文本的分布。一种方法是 RAG-Sequence,模型使用同一个文档来预测每个目标词元。第二种方法是 RAG-Token,它可以基于不同的文档预测每个目标词元。下面,我们将正式介绍这两种模型,然后描述 pηpθ 组件,以及训练和解码过程。

2.1 Models

RAG-Sequence Model The RAG-Sequence model uses the same retrieved document to generate the complete sequence. Technically, it treats the retrieved document as a single latent variable that is marginalized to get the seq2seq probability p(y|x) via a top-K approximation. Concretely, the top K documents are retrieved using the retriever, and the generator produces the output sequence probability for each document, which are then marginalized,

2.1 模型

RAG-Sequence 模型 RAG-Sequence 模型使用同一个检索到的文档来生成完整的序列。技术上,它将检索到的文档视为单个潜在变量,通过 top-K 近似进行边缘化得到 seq2seq 概率 p(y|x)。具体来说,使用检索器检索 top K 个文档,生成器为每个文档生成输出序列的概率,然后进行边缘化:

pRAG-Sequence (yx)z top k(p(x))pη(zx)pθ(yx,z)=ztopk(p(x))pη(zx)iNpθ(yix,z,y1:i1)

RAG-Token Model In the RAG-Token model we can draw a different latent document for each target token and marginalize accordingly. This allows the generator to choose content from several documents when producing an answer. Concretely, the top K documents are retrieved using the retriever, and then the generator produces a distribution for the next output token for each document, before marginalizing, and repeating the process with the following output token, Formally, we define:

RAG-Token 模型 在 RAG-Token 模型中,我们可以为每个目标词元抽取不同的潜在文档并进行相应的边缘化。这允许生成器在生成答案时从多个文档中选择内容。具体来说,使用检索器检索 top K 个文档,然后生成器为每个文档产生下一个输出词元的分布,之后进行边缘化,并对后续输出词元重复该过程。形式上,我们定义:

pRAG-Token (yx)iNz top k(p(x))pη(zx)pθ(yix,z,y1:i1)

Finally, we note that RAG can be used for sequence classification tasks by considering the target class as a target sequence of length one, in which case RAG-Sequence and RAG-Token are equivalent.

最后,我们注意到 RAG 也可以用于序列分类任务,只需将目标类别视为长度为1的目标序列,此时 RAG-Sequence 和 RAG-Token 是等价的。

2.2 Retriever: DPR

The retrieval component pη(z|x) is based on DPR. DPR follows a bi-encoder architecture:

2.2 检索器:DPR

检索组件 pη(z|x) 基于 DPR。DPR 采用双编码器架构:

pη(zx)exp(d(z)q(x))d(z)=BERTd(z),q(x)=BERTq(x)

where d(z) is a dense representation of a document produced by a BERTBASE document encoder, and q(x) a query representation produced by a query encoder, also based on BERTBASE. Calculating top-k(pη(|x)), the list of k documents z with highest prior probability pη(z|x), is a Maximum Inner Product Search (MIPS) problem, which can be approximately solved in sub-linear time. We use a pre-trained bi-encoder from DPR to initialize our retriever and to build the document index. This retriever was trained to retrieve documents which contain answers to TriviaQA questions and Natural Questions. We refer to the document index as the non-parametric memory.

其中 d(z) 是由 BERTBASE 文档编码器生成的文档的稠密表示,而 q(x) 是由同样基于 BERTBASE 的查询编码器生成的查询表示。计算 top-k(pη(|x)),即先验概率 pη(z|x) 最高的 k 个文档 z 的列表,是一个最大内积搜索问题,可以在亚线性时间内近似求解。我们使用 DPR 中预训练的双编码器来初始化我们的检索器并构建文档索引。该检索器经过训练,能够检索包含 TriviaQA 问题和 Natural Questions 答案的文档。我们将文档索引称为非参数化记忆。

2.3 Generator: BART

The generator component pθ(yi|x,z,y1:i1) could be modelled using any encoder-decoder. We use BART-large, a pre-trained seq2seq transformer with 400M parameters. To combine the input x with the retrieved content z when generating from BART, we simply concatenate them. BART was pre-trained using a denoising objective and a variety of different noising functions. It has obtained state-of-the-art results on a diverse set of generation tasks and outperforms comparably-sized T5 models. We refer to the BART generator parameters θ as the parametric memory henceforth.

2.3 生成器:BART

生成器组件 pθ(yi|x,z,y1:i1) 可以使用任何编码器-解码器进行建模。我们使用 BART-large,一个拥有 4 亿参数的预训练 seq2seq Transformer。为了在使用 BART 生成时将输入 x 与检索到的内容 z 结合起来,我们简单地将它们拼接。BART 使用去噪目标和多种不同的加噪函数进行预训练。它在多种不同的生成任务上取得了最先进的结果,并优于同等规模的 T5 模型。下文我们将 BART 生成器的参数 θ 称为参数化记忆

2.4 Training

We jointly train the retriever and generator components without any direct supervision on what document should be retrieved. Given a fine-tuning training corpus of input/output pairs (xj,yj), we minimize the negative marginal log-likelihood of each target, jlogp(yj|xj) using stochastic gradient descent with Adam. Updating the document encoder BERTd during training is costly as it requires the document index to be periodically updated as REALM does during pre-training. We do not find this step necessary for strong performance, and keep the document encoder (and index) fixed, only fine-tuning the query encoder BERTq and the BART generator.

2.4 训练

我们在没有对应该检索哪个文档进行任何直接监督的情况下,联合训练检索器和生成器组件。给定一个由输入/输出对 (xj,yj) 组成的微调训练语料库,我们使用 Adam 随机梯度下降最小化每个目标的负边缘对数似然,jlogp(yj|xj)。在训练期间更新文档编码器 BERTd 代价高昂,因为需要像 REALM 在预训练期间所做的那样定期更新文档索引。我们发现对于获得强性能而言,这一步并非必要,因此我们保持文档编码器(和索引)固定,仅微调查询编码器 BERTq 和 BART 生成器。

2.5 Decoding

At test time, RAG-Sequence and RAG-Token require different ways to approximate argmaxyp(y|x).

2.5 解码

在测试时,RAG-Sequence 和 RAG-Token 需要不同的方法来近似 argmaxyp(y|x)

RAG-Token The RAG-Token model can be seen as a standard, autoregressive seq2seq generator with transition probability: pθ(yi|x,y1:i1)= ztop-k(p(|x))pη(zi|x)pθ(yi|x,zi,y1:i1) To decode, we can plug pθ(yi|x,y1:i1) into a standard beam decoder.

RAG-Token RAG-Token 模型可以看作是一个标准的、自回归的 seq2seq 生成器,其转移概率为:pθ(yi|x,y1:i1)= ztop-k(p(|x))pη(zi|x)pθ(yi|x,zi,y1:i1)为了进行解码,我们可以将 pθ(yi|x,y1:i1) 插入到标准的束解码器中。

RAG-Sequence For RAG-Sequence, the likelihood p(y|x) does not break into a conventional per-token likelihood, hence we cannot solve it with a single beam search. Instead, we run beam search for each document z, scoring each hypothesis using pθ(yi|x,z,y1:i1). This yields a set of hypotheses Y, some of which may not have appeared in the beams of all documents. To estimate the probability of an hypothesis y we run an additional forward pass for each document z for which y does not appear in the beam, multiply generator probability with pη(z|x) and then sum the probabilities across beams for the marginals. We refer to this decoding procedure as "Thorough Decoding." For longer output sequences, |Y| can become large, requiring many forward passes. For more efficient decoding, we can make a further approximation that pθ(y|x,zi)0 where y was not generated during beam search from x,zi. This avoids the need to run additional forward passes once the candidate set Y has been generated. We refer to this decoding procedure as "Fast Decoding."

RAG-Sequence 对于 RAG-Sequence,似然度 p(y|x) 无法分解为传统的逐词元似然度,因此我们无法用单个束搜索来解决它。相反,我们为每个文档 z 运行束搜索,使用 pθ(yi|x,z,y1:i1) 对每个假设进行评分。这会生成一个假设集合 Y,其中一些假设可能并未出现在所有文档的束中。为了估计一个假设 y 的概率,我们为那些在束中未出现 y 的每个文档 z 运行一次额外的前向传播,将生成器概率与 pη(z|x) 相乘,然后将各束的概率相加得到边缘概率。我们将此解码过程称为“彻底解码”。对于较长的输出序列,|Y| 可能会变得很大,需要多次前向传播。为了更高效地解码,我们可以进一步近似,假设对于在从 x,zi 进行的束搜索过程中未生成的 y,其 pθ(y|x,zi)0。这避免了在生成候选集 Y 后运行额外的前向传播的需要。我们将此解码过程称为“快速解码”。

Single-Task Retrieval Prior work has shown that retrieval improves performance across a variety of NLP tasks when considered in isolation. Such tasks include open-domain question answering, fact checking, fact completion, long-form question answering, Wikipedia article generation, dialogue, translation, and language modeling. Our work unifies previous successes in incorporating retrieval into individual tasks, showing that a single retrieval-based architecture is capable of achieving strong performance across several tasks.

单任务检索 先前的研究表明,当孤立地考虑时,检索能够提升各种 NLP 任务的性能。这些任务包括开放域问答、事实验证、事实补全、长文本问答、维基百科文章生成、对话、翻译和语言建模。我们的工作统一了先前将检索融入单个任务的成功经验,表明单一的基于检索的架构能够在多个任务上取得强劲性能。

General-Purpose Architectures for NLP Prior work on general-purpose architectures for NLP tasks has shown great success without the use of retrieval. A single, pre-trained language model has been shown to achieve strong performance on various classification tasks in the GLUE benchmark after fine-tuning. GPT-2 later showed that a single, left-to-right, pre-trained language model could achieve strong performance across both discriminative and generative tasks. For further improvement, BART and T5 propose a single, pre-trained encoder-decoder model that leverages bi-directional attention to achieve stronger performance on discriminative and generative tasks. Our work aims to expand the space of possible tasks with a single, unified architecture, by learning a retrieval module to augment pre-trained, generative language models.

NLP 通用架构 先前关于 NLP 任务通用架构的工作在没有使用检索的情况下已经显示出巨大成功。一个单一的、预训练的语言模型在微调后,能够在 GLUE 基准的多种分类任务上取得强劲性能。GPT-2 后来表明,一个单一的、从左到右的预训练语言模型能够在判别式和生成式任务上都取得强劲性能。为了进一步提升,BART 和 T5 提出了单一的预训练编码器-解码器模型,利用双向注意力在判别式和生成式任务上取得更强的性能。我们的工作旨在通过学习一个检索模块来增强预训练的生成式语言模型,从而用单一的统一架构扩展可能完成的任务空间。

Learned Retrieval There is significant work on learning to retrieve documents in information retrieval, more recently with pre-trained, neural language models similar to ours. Some work optimizes the retrieval module to aid in a specific, downstream task such as question answering, using search, reinforcement learning, or a latent variable approach as in our work. These successes leverage different retrieval-based architectures and optimization techniques to achieve strong performance on a single task, while we show that a single retrieval-based architecture can be fine-tuned for strong performance on a variety of tasks.

学习型检索 在信息检索领域,关于学习检索文档有大量重要工作,最近的研究使用了与我们类似的预训练神经语言模型。一些工作优化检索模块以辅助特定的下游任务(如问答),使用了搜索、强化学习或如我们工作中的潜在变量方法。这些成功经验利用不同的基于检索的架构和优化技术,在单个任务上取得了强劲性能,而我们展示了单一的基于检索的架构可以通过微调在多种任务上取得强劲性能。

Memory-based Architectures Our document index can be seen as a large external memory for neural networks to attend to, analogous to memory networks. Concurrent work learns to retrieve a trained embedding for each entity in the input, rather than to retrieve raw text as in our work. Other work improves the ability of dialog models to generate factual text by attending over fact embeddings. A key feature of our memory is that it is comprised of raw text rather distributed representations, which makes the memory both (i) human-readable, lending a form of interpretability to our model, and (ii) human-writable, enabling us to dynamically update the model's memory by editing the document index. This approach has also been used in knowledge-intensive dialog, where generators have been conditioned on retrieved text directly, albeit obtained via TF-IDF rather than end-to-end learnt retrieval.

基于记忆的架构 我们的文档索引可以看作神经网络可以关注的大型外部记忆,类似于记忆网络。同期工作学习检索输入中每个实体的训练嵌入,而非像我们这样检索原始文本。其他工作通过关注事实嵌入来提高对话模型生成符合事实的文本的能力。我们记忆的一个关键特征在于它由原始文本而非分布式表示组成,这使得记忆既(i)可读,为模型提供了一种可解释性,又(ii)可写,使我们能够通过编辑文档索引来动态更新模型的记忆。这种方法也已用于知识密集型对话,尽管生成器直接基于检索到的文本进行条件生成,但这些文本是通过 TF-IDF 而非端到端学习的检索获得的。

Retrieve-and-Edit approaches Our method shares some similarities with retrieve-and-edit style approaches, where a similar training input-output pair is retrieved for a given input, and then edited to provide a final output. These approaches have proved successful in a number of domains including Machine Translation and Semantic Parsing. Our approach does have several differences, including less of emphasis on lightly editing a retrieved item, but on aggregating content from several pieces of retrieved content, as well as learning latent retrieval, and retrieving evidence documents rather than related training pairs. This said, RAG techniques may work well in these settings, and could represent promising future work.

检索-编辑方法 我们的方法与检索-编辑风格的方法有一些相似之处,后者为给定输入检索一个相似的训练输入-输出对,然后进行编辑以提供最终输出。这些方法在包括机器翻译和语义解析在内的多个领域已被证明是成功的。我们的方法确实有几个不同之处,包括不那么强调对检索项进行轻微编辑,而是侧重于聚合来自多个检索项的内容,以及学习潜在检索,并检索证据文档而非相关的训练对。尽管如此,RAG 技术在这些场景下也可能效果很好,并可能代表有前景的未来工作。

Discussion

In this work, we presented hybrid generation models with access to parametric and non-parametric memory. We showed that our RAG models obtain state of the art results on open-domain QA. We found that people prefer RAG's generation over purely parametric BART, finding RAG more factual and specific. We conducted an thorough investigation of the learned retrieval component, validating its effectiveness, and we illustrated how the retrieval index can be hot-swapped to update the model without requiring any retraining. In future work, it may be fruitful to investigate if the two components can be jointly pre-trained from scratch, either with a denoising objective similar to BART or some another objective. Our work opens up new research directions on how parametric and non-parametric memories interact and how to most effectively combine them, showing promise in being applied to a wide variety of NLP tasks.

在这项工作中,我们提出了能够访问参数化和非参数化记忆的混合生成模型。我们展示了 RAG 模型在开放域问答上取得了最先进的结果。我们发现人们更偏好 RAG 的生成结果而非纯参数化的 BART,认为 RAG 的生成更符合事实且更具体。我们对学习到的检索组件进行了彻底研究,验证了其有效性,并展示了如何热交换检索索引以更新模型而无需任何重新训练。在未来工作中,研究这两个组件是否可以一起从头开始联合预训练,使用类似于 BART 的去噪目标或其他目标,可能会很有成效。我们的工作为参数化和非参数化记忆如何相互作用以及如何最有效地结合它们开辟了新的研究方向,显示出应用于广泛 NLP 任务的潜力。

Broader Impact

This work offers several positive societal benefits over previous work: the fact that it is more strongly grounded in real factual knowledge (in this case Wikipedia) makes it "hallucinate" less with generations that are more factual, and offers more control and interpretability. RAG could be employed in a wide variety of scenarios with direct benefit to society, for example by endowing it with a medical index and asking it open-domain questions on that topic, or by helping people be more effective at their jobs.

这项工作比先前的工作提供了若干积极的社会效益:它更牢固地基于真实的事实知识(此处为维基百科),使得生成的“幻觉”更少,结果更符合事实,并提供了更多的可控性和可解释性。RAG 可以应用于广泛的对社会有直接益处的场景,例如为其配备医学索引并就相关主题提问开放域问题,或帮助人们提高工作效率。

With these advantages also come potential downsides: Wikipedia, or any potential external knowledge source, will probably never be entirely factual and completely devoid of bias. Since RAG can be employed as a language model, similar concerns as for GPT-2 are valid here, although arguably to a lesser extent, including that it might be used to generate abuse, faked or misleading content in the news or on social media; to impersonate others; or to automate the production of spam/phishing content. Advanced language models may also lead to the automation of various jobs in the coming decades. In order to mitigate these risks, AI systems could be employed to fight against misleading content and automated spam/phishing.

伴随这些优势而来的也有潜在缺点:维基百科或任何潜在的外部知识源,可能永远不会完全符合事实且完全没有偏见。由于 RAG 可以作为语言模型使用,与 GPT-2 类似的担忧在这里也是存在的,尽管可以说程度较轻,包括它可能被用来生成辱骂性、虚假或误导性的新闻或社交媒体内容;冒充他人;或自动化生成垃圾邮件/钓鱼内容。先进的语言模型也可能在未来几十年导致各种工作的自动化。为了减轻这些风险,可以部署 AI 系统来对抗误导性内容和自动化垃圾邮件/钓鱼信息。