通过多智能体对话实现下一代 LLM 应用
Abstract
We present AutoGen, an open-source framework that allows developers to build LLM applications by composing multiple agents to converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes that employ combinations of LLMs, human inputs, and tools. It also enables developers to create flexible agent behaviors and conversation patterns for different applications using both natural language and code. AutoGen serves as a generic infrastructure and is widely used by AI practitioners and researchers to build diverse applications of various complexities and LLM capacities. We demonstrate the framework's effectiveness with several pilot applications, on domains ranging from mathematics and coding to question-answering, supply-chain optimization, online decision-making, and entertainment.
我们提出了 AutoGen,一个开源框架,允许开发者通过组合多个相互对话的智能体来构建 LLM 应用程序,以完成任务。AutoGen 智能体是可定制的、可对话的,并且能够在多种模式下运行,这些模式结合了 LLM、人类输入和工具。它还使开发者能够使用自然语言和代码为不同的应用程序创建灵活的智能体行为和对话模式。AutoGen 作为一个通用基础设施,被 AI 从业者和研究人员广泛用于构建各种复杂程度和 LLM 能力的多样化应用。我们通过几个试点应用展示了该框架的有效性,这些应用涵盖数学、编程、问答、供应链优化、在线决策和娱乐等领域。
Introduction
Large language models (LLMs) are becoming a crucial building block in developing powerful agents that utilize LLMs for reasoning, tool usage, and adapting to new observations (Yao et al., 2022; Xi et al., 2023; Wang et al., 2023). As the scope and complexity of tasks suitable for LLMs increase, a natural strategy for enhancing agent capabilities is to employ multiple cooperating agents. Prior work suggests that multiple agents can help encourage divergent thinking (Liang et al., 2023), improve factuality and reasoning (Du et al., 2023; Naik et al., 2023), and provide guardrails (Wu et al., 2023). Given the early promising evidence, an intriguing question is: How can we facilitate the development of LLM applications that span a broad spectrum of domains and complexities using a multi-agent approach? Our insight is to use multi-agent conversations. There are at least three reasons confirming its general feasibility and utility, thanks to recent advances in LLMs: First, chat-optimized LLMs, such as GPT-4, demonstrate ability to incorporate feedback. LLM agents can cooperate through conversations with each other or humans, for example, in a dialogue where agents provide and seek reasoning, observations, critiques, and validation. Second, because a single LLM can exhibit a broad range of capabilities, conversations between differently configured agents can help combine these broad LLM capabilities in a modular and complementary manner. Third, LLMs have demonstrated the ability to solve complex tasks when broken into simpler sub-tasks. Multi-agent conversations can intuitively facilitate this partitioning and integration.
大型语言模型正成为开发强大智能体的关键构建模块,这些智能体利用LLM进行推理、工具使用以及适应新的观察。随着适用于LLM的任务范围和复杂性的增加,增强智能体能力的一个自然策略是采用多个协同工作的智能体。先前的研究表明,多个智能体有助于鼓励发散性思维、提高事实性和推理能力,并提供安全护栏。鉴于早期令人鼓舞的证据,一个引人深思的问题是:我们如何利用多智能体方法促进跨越广泛领域和复杂性的LLM应用的开发?我们的见解是使用多智能体对话。得益于LLM的最新进展,至少有三个原因证实了其普遍的可行性和实用性:首先,面向聊天的优化LLM(如GPT-4)展现了整合反馈的能力。LLM智能体可以通过彼此之间或与人类的对话进行协作,例如在对话中智能体提供并寻求推理、观察、批评和验证。其次,由于单个LLM可以表现出广泛的能力,不同配置的智能体之间的对话有助于以模块化和互补的方式组合这些广泛的LLM能力。第三,LLM已经证明了将复杂任务分解为更简单的子任务时能够解决这些任务的能力。多智能体对话可以直观地促进这种划分和整合。
We desire a multi-agent conversation framework with generic abstraction and effective implementation that has the flexibility to satisfy different application needs. Achieving this requires addressing two critical questions: (1) How to design individual agents that are capable, reusable, customizable, and effective in multi-agent collaboration? (2) How can we develop a straightforward, unified interface that accommodates a wide range of agent conversation patterns? In practice, applications of varying complexities may need distinct sets of agents with specific capabilities and may require different conversation patterns, such as single- or multi-turn dialogues, different human involvement modes, and static vs. dynamic conversations. Moreover, developers may prefer the flexibility to program agent interactions in natural language or code. We present AutoGen, a generalized multi-agent conversation framework (Figure 1), based on the following new concepts:
我们渴望一个具有通用抽象和有效实现的多智能体对话框架,其灵活性能够满足不同的应用需求。实现这一目标需要解决两个关键问题:(1)如何设计出有能力、可重用、可定制且在多智能体协作中有效的单个智能体?(2)我们如何开发一个简单、统一的接口,以适应广泛的智能体对话模式?在实践中,不同复杂度的应用可能需要具有特定能力的不同的智能体集合,并可能需要不同的对话模式,例如单轮或多轮对话、不同的人类参与模式以及静态与动态对话。此外,开发者可能更倾向于灵活地使用自然语言或代码来编程智能体交互。我们提出了AutoGen,一个通用的多智能体对话框架,基于以下新概念:
1 Customizable and conversable agents. AutoGen uses a generic agent design that can leverage LLMs, human inputs, tools, or a combination thereof. Developers can conveniently create agents with different roles or responsibilities by selecting and configuring a subset of built-in capabilities or defining new capabilities. To make these agents suitable for multi-agent conversation, every agent is made conversable – they can receive, react, and respond to messages. When configured properly, an agent can hold multiple turns of conversations with other agents autonomously or with humans in the loop. The conversable agent design leverages the strong capability of the most advanced LLMs in taking feedback and making progress via conversation, and also allows combining capabilities of LLMs in a modular fashion. (Section 2.1)
1 可定制和可对话的智能体。 AutoGen采用了一种通用的智能体设计,可以利用LLM、人类输入、工具或其组合。开发者可以通过选择和配置一组内置能力或定义新能力,方便地创建具有不同角色或职责的智能体。为了使这些智能体适用于多智能体对话,每个智能体都被设计为可对话的——它们可以接收、反应和响应消息。当配置适当时,一个智能体可以自主地或与人类参与的情况下与其他智能体进行多轮对话。可对话的智能体设计利用了最先进LLM在通过对话接收反馈和取得进展方面的强大能力,并允许以模块化方式组合LLM的能力。
2 Conversation programming. A fundamental insight of AutoGen is to simplify and unify complex LLM applications as multi-agent conversations. Thus, AutoGen adopts a programming paradigm centered around these inter-agent conversations. We refer to this paradigm as conversation programming, which streamlines the development of intricate applications via two primary steps: (1) defining a set of conversable agents with specific capabilities and roles; (2) programming the interaction behavior between agents via conversation-centric computation and control. Both steps can be achieved via a fusion of natural and programming languages. AutoGen provides ready-to-use implementations and also allows easy extension and experimentation for both steps. (Section 2.2)
2 对话编程。 AutoGen的一个基本见解是将复杂的LLM应用简化和统一为多智能体对话。因此,AutoGen采用了一种围绕这些智能体间对话的编程范式。我们将这种范式称为对话编程,它通过两个主要步骤简化了复杂应用的开发:(1)定义一组具有特定能力和角色的可对话智能体;(2)通过以对话为中心的计算和控制来编程智能体之间的交互行为。这两个步骤都可以通过自然语言和编程语言的融合来实现。AutoGen提供了现成的实现,并允许对这两个步骤进行轻松的扩展和实验。
We offer a suite of multi-agent applications realized with AutoGen, showcasing the framework's ability to support applications of varied complexities. With these applications, we demonstrate AutoGen's potential to significantly enhance task completion performance and innovate LLM usage while minimizing development effort. Beyond the demonstrated applications, AutoGen has also seen widespread adoption in the wild, fostering a vibrant and active community.
我们提供了一套使用AutoGen实现的多智能体应用,展示了该框架支持不同复杂度应用的能力。通过这些应用,我们展示了AutoGen在显著提升任务完成性能和创新LLM使用方式方面的潜力,同时最大限度地减少了开发工作。除了所展示的应用之外,AutoGen还在实际中被广泛采用,培育了一个充满活力的活跃社区。
Related Work. Several contemporaneous explorations of multi-agent approaches exist, including Generative Agents (Park et al., 2023), multi-agent debate (Liang et al., 2023; Du et al., 2023), CAMEL (Li et al., 2023b), BabyAGI (BabyAGI, 2023), MetaGPT (Hong et al., 2023), ChatDev (Qian et al., 2023), AgentVerse (Chen et al., 2023b), AutoAgents (Chen et al., 2023a). These systems are designed for specific types of scenarios or problem-solving paradigms, which limits their flexibility and generalizability as comprehensive frameworks. For instance, MetaGPT and ChatDev prioritize software engineering tasks and only support certain multi-agent structures, such as chains or Standardized Operating Procedures. AgentVerse primarily simulates the problem-solving processes of a human group following a sequence of pre-defined stages. CAMEL supports multi-agent systems with two or three agents following a fixed workflow pattern. One notable difference of AutoGen is that it supports diverse workflows because of its composable conversation patterns and does not explicitly restrict the number of agents. We include an expanded discussion of this related work and single-agent systems/frameworks in Appendix B.
相关工作。 目前存在几种同时期的多智能体方法探索,包括生成式智能体、多智能体辩论、CAMEL、BabyAGI、MetaGPT、ChatDev、AgentVerse、AutoAgents等。这些系统是为特定类型的场景或问题解决范式而设计的,这限制了它们作为通用框架的灵活性和泛化能力。例如,MetaGPT和ChatDev优先考虑软件工程任务,并且只支持某些多智能体结构,如链式或标准化操作流程。AgentVerse主要模拟人类群体按照一系列预定义阶段解决问题的过程。CAMEL支持具有两三个智能体的多智能体系统,遵循固定的工作流模式。AutoGen的一个显著区别在于,由于其可组合的对话模式,它支持多样化的 workflows,并且没有明确限制智能体的数量。我们在附录B中对这一相关工作以及单智能体系统/框架进行了扩展讨论。
The AutoGen Framework
To reduce the effort required for developers to create complex LLM applications across various domains, a core design principle of AutoGen is to streamline them using multi-agent conversations. This approach also aims to maximize the reusability of implemented agents. This section introduces the two key concepts of AutoGen: conversable agents and conversation programming.
为了减少开发者在各个领域创建复杂 LLM 应用所需的工作量,AutoGen 的一个核心设计原则是利用多智能体对话来简化这些应用。这种方法还旨在最大化已实现智能体的可重用性。本节介绍 AutoGen 的两个关键概念:可对话智能体和对话编程。
2.1 Conversable Agents
In AutoGen, a conversable agent is an entity with a specific role that can send message to and receive message from the other conversable agents, e.g., to start or continue a conversation. It maintains its internal context based on sent and received messages and can be configured to possess a set of capabilities, e.g., enabled by LLMs, tools, human input, etc. The agents can act according to the programmed behavior patterns described next.
2.1 可对话智能体
在 AutoGen 中,一个可对话智能体是一个具有特定角色的实体,它可以向其他可对话智能体发送消息并接收来自它们的消息,例如启动或继续一段对话。它基于发送和接收的消息维护其内部上下文,并可被配置为拥有一组能力,例如由 LLM、工具、人类输入等启用的能力。智能体可以根据接下来描述的编程行为模式进行运作。
Agent capabilities powered by LLMs, humans, and tools. AutoGen allows flexibility to equip its agents with various capabilities, which directly affect how it processes and responds to messages. The built-in composable agent capabilities include: 1) LLMs. LLM-backed agents utilize advanced capabilities such as role-playing, implicit state inference, making progress based on conversation history, and coding. These capabilities can be combined and enhanced in different ways via novel prompting techniques. AutoGen also offers enhanced LLM inference features such as result caching, error handling, message templating, etc., via an enhanced LLM inference layer. 2) Humans. Human involvement is desired or even essential in many LLM applications. AutoGen lets a human participate in agent conversation via human-backed agents, which could solicit human inputs at certain rounds of a conversation depending on the agent configuration. The default user proxy agent allows configurable human involvement levels and patterns, e.g., frequency and conditions for requesting human input including the option for humans to skip providing input. 3) Tools. Tool-backed agents have the capability to execute tools via code execution or function execution. For example, the default user proxy agent in AutoGen is able to execute code suggested by LLMs, or make LLM-suggested function calls.
由 LLM、人类和工具驱动的智能体能力。 AutoGen 允许灵活地为其智能体配备各种能力,这些能力直接影响其处理和响应消息的方式。内置的可组合智能体能力包括:1) LLM。 由 LLM 驱动的智能体利用先进的能力,如角色扮演、隐式状态推断、基于对话历史取得进展以及编码。这些能力可以通过新颖的提示技术以不同方式组合和增强。AutoGen 还通过增强的 LLM 推理层提供了增强的 LLM 推理功能,例如结果缓存、错误处理、消息模板等。2) 人类。 在许多 LLM 应用中,人类的参与是期望的甚至是必不可少的。AutoGen 通过由人类驱动的智能体让人类参与智能体对话,根据智能体配置,这些智能体可以在对话的特定轮次征求人类输入。默认的用户代理智能体允许配置人类参与的程度和模式,例如请求人类输入的频率和条件,包括允许人类跳过提供输入的选项。3) 工具。 由工具驱动的智能体具备通过代码执行或函数执行来运行工具的能力。例如,AutoGen 中的默认用户代理智能体能够执行 LLM 建议的代码,或进行 LLM 建议的函数调用。
Agent customization. Based on application-specific needs, each agent can be configured to have a mix of basic back-end types to exhibit complex behavior in multi-agent conversations. AutoGen allows easy creation of agents with specialized capabilities and roles by reusing or extending the built-in agents. The yellow-shaded area of Figure 2 provides a sketch of the built-in agents in AutoGen. The ConversableAgent class is the most basic agent abstraction and, by default, can use LLMs, humans, and tools. The AssistantAgent and UserProxyAgent are two pre-configured ConversableAgent subclasses, each representing a common usage mode, i.e., acting as an AI assistant (backed by LLMs) and acting as a human proxy to solicit human input or execute code/function calls (backed by humans and/or tools). In the example on the right-hand side of Figure 1, an LLM-backed assistant agent and a tool- and human-backed user proxy agent are deployed together to tackle a task. Here, the assistant agent generates a solution with the help of LLMs and passes the solution to the user proxy agent. Then, the user proxy agent solicits human inputs or executes the assistant's code and passes the results as feedback back to the assistant. One can compose a complex agent using nested chat (introduced in the next subsection) among simpler agents and increase the complexity recursively.
智能体定制化。 基于应用特定的需求,每个智能体可以被配置为混合多种基本的后端类型,以在多智能体对话中展现复杂的行为。AutoGen 允许通过重用或扩展内置智能体来轻松创建具有专门能力和角色的智能体。图 2 中黄色阴影区域描绘了 AutoGen 中内置智能体的概览。ConversableAgent 类是最基本的智能体抽象,默认情况下可以使用 LLM、人类和工具。AssistantAgent 和 UserProxyAgent 是两个预配置的 ConversableAgent 子类,各自代表一种常见的使用模式,即充当 AI 助手(由 LLM 驱动)和充当人类代理以征求人类输入或执行代码/函数调用(由人类和/或工具驱动)。在图 1 右侧的示例中,一个由 LLM 驱动的助手智能体和一个由工具和人类驱动的用户代理智能体被部署在一起以处理任务。在此,助手智能体借助 LLM 生成一个解决方案,并将该方案传递给用户代理智能体。然后,用户代理智能体征求人类输入或执行助手的代码,并将结果作为反馈传回给助手。人们可以通过在更简单的智能体之间使用嵌套聊天(将在下一小节介绍)来组合成一个复杂的智能体,并递归地增加复杂度。
2.2 Conversation Programming
To develop applications where agents make meaningful progress on tasks, developers also need to be able to specify and properly control these multi-agent conversations. To this end, AutoGen utilizes conversation programming, a paradigm that concerns two concepts: the first is computation – the actions agents take to compute their response in a multi-agent conversation. And the second is control flow – the order and conditions under which individual computations in the conversation are executed or evaluated. As we will show in the applications section, the ability to program these helps implement many flexible multi-agent conversation patterns. In AutoGen, agent computations are conversation-centric. An agent takes actions based on conversations it is involved in, and these actions further lead to message passing for subsequent conversations. Similarly, control flow is conversation-driven – the participating agents' decisions on which agents to send messages to and the procedure of computation are functions of the inter-agent conversation. This paradigm facilitates intuitive reasoning about complex workflows through actions of agents and message-passing between agents.
2.2 对话编程
为了开发智能体能够在任务上取得有意义进展的应用,开发者还需要能够指定并适当控制这些多智能体对话。为此,AutoGen 采用了 对话编程,这是一种涉及两个概念的范式:第一个是 计算——智能体在多智能体对话中为计算其响应所采取的行动。第二个是 控制流——对话中各个计算被执行或评估的顺序和条件。正如我们将在应用部分展示的那样,对这些概念进行编程的能力有助于实现许多灵活的多智能体对话模式。在 AutoGen 中,智能体的计算是以对话为中心的。智能体根据其所参与的对话采取行动,而这些行动又进一步导致用于后续对话的消息传递。类似地,控制流是由对话驱动的——参与智能体关于向哪些智能体发送消息的决策以及计算的过程,都是智能体间对话的函数。这种范式通过智能体的行动和智能体间的消息传递,促进了对复杂工作流的直观推理。
图 2:使用 AutoGen 编程多智能体对话的示意图。顶部子图展示了 AutoGen 提供的内置可对话智能体。中间子图展示了一个使用 AutoGen 开发具有自定义回复函数的双智能体系统的示例。底部子图展示了程序执行期间由该双智能体系统产生的自动化智能体聊天。
Figure 2 provides a simple illustration. The middle sub-figure shows how each individual agent performs its role-specific, conversation-centric computations to generate responses (e.g., via LLM inference calls and code execution). The bottom sub-figure demonstrates a conversation-based control flow. When the assistant receives a message, the user proxy agent generates a reply based on code execution or solicits human inputs. The task progresses through conversations displayed in the dialog box. AutoGen features the following design patterns to facilitate conversation programming.
图2提供了一个简单的说明。中间的子图展示了每个单独的智能体如何执行其特定角色的、以对话为中心的计算以生成响应(例如,通过 LLM 推理调用和代码执行)。底部的子图展示了一个基于对话的控制流。当助手收到一条消息时,用户代理智能体基于代码执行或征求人类输入来生成回复。任务通过对话框中显示的对话得以推进。AutoGen 具有以下设计模式以促进对话编程。
Unified interfaces and auto-reply mechanisms for automated agent chat. Agents in AutoGen have unified conversation interfaces for performing the corresponding conversation-centric computation. Those low-level interfaces include:
- send/receive for sending/receiving messages; and
- generate_reply for taking actions and generating a response based on the received message;
- register_reply for registering custom reply function.
用于自动化智能体聊天的统一接口和自动回复机制 AutoGen 中的智能体具有统一的对话接口,用于执行相应的以对话为中心的计算。这些低级接口包括:
- send/receive 用于发送/接收消息;
- generate_reply 用于根据接收到的消息采取行动并生成响应;
- register_reply 用于注册自定义回复函数。
AutoGen also introduces and by default adopts an agent auto-reply mechanism to realize conversation-driven control: Once an agent receives a message from another agent, it automatically invokes generate_reply and sends the reply back to the sender unless a termination condition is satisfied. AutoGen provides built-in reply functions based on LLM inference, code or function execution, or human input. One can also register custom reply functions (via the register_reply interface) to customize the behavior pattern of an agent, e.g., to chat with another agent before replying to the sender agent realizing the nested chat conversation pattern. Under this mechanism, once the reply functions are registered, and the conversation is initialized, the conversation flow is naturally induced, and thus the agent conversation proceeds naturally without any extra control plane, i.e., a special module that controls the conversation flow. For example, with the developer code in the blue-shaded area (marked "Developer Code") of Figure 2, one can readily trigger the conversation among the agents, and the conversation would proceed automatically, as shown in the dialog box in the grey shaded area (marked "Program Execution") of Figure 2. The auto-reply mechanism provides a decentralized, modular, and unified way to define the workflow.
AutoGen 还引入并默认采用了一种 智能体自动回复 机制来实现对话驱动的控制:一旦一个智能体收到来自另一个智能体的消息,它会自动调用 generate_reply 并将回复发送回发送者,除非满足终止条件。AutoGen 提供了基于 LLM 推理、代码或函数执行、或人类输入的内置回复函数。人们还可以注册自定义回复函数(通过 register_reply 接口)来定制智能体的行为模式,例如,在回复发送者智能体之前与另一个智能体聊天,实现 嵌套聊天 对话模式。在此机制下,一旦回复函数被注册,并且对话被初始化,对话流就被自然地诱导出来,因此智能体对话无需任何额外的控制平面(即一个控制对话流的特殊模块)即可自然进行。例如,利用图2中蓝色阴影区域(标记为“开发者代码”)的开发者代码,可以轻松触发智能体之间的对话,而对话将自动进行,如图2灰色阴影区域(标记为“程序执行”)的对话框所示。自动回复机制提供了一种去中心化、模块化和统一的方式来定义工作流。
Control by fusion of programming and natural language. AutoGen allows the usage of programming and natural language in various control flow management patterns:
- Natural-language control via LLMs: One can control the conversation flow by prompting LLM-backed agents with natural language. For instance, the default system message of the built-in AssistantAgent uses natural language to instruct agents to write code and debug when needed. It also guides the agent to confine LLM outputs, making it easier for other agents to consume. More examples of such controls can be found in Appendix D.
- Programming-language control: In AutoGen, Python code can be used to specify the termination condition, human input mode, and tool execution logic, e.g., the max number of auto replies. One can also register programmed auto-reply functions to control the conversation flow with Python code, as shown in the code block identified as "Conversation-Driven Control Flow" in Figure 2.
- Control transition between natural and programming language: AutoGen also supports flexible control transition between natural and programming language. One can achieve transition from code to natural-language control by invoking an LLM inference containing certain control logic in a customized reply function; or transition from natural language to code control via LLM-proposed function calls.
编程与自然语言融合的控制。 AutoGen 允许在各种控制流管理模式中使用编程语言和自然语言:
- 通过 LLM 的自然语言控制:人们可以通过用自然语言提示由 LLM 驱动的智能体来控制对话流。例如,内置 AssistantAgent 的默认系统消息使用自然语言来指示智能体在需要时编写代码和调试。它还引导智能体限制 LLM 输出,使其更易于其他智能体使用。此类控制的更多示例可在附录 D 中找到。
- 编程语言控制:在 AutoGen 中,可以使用 Python 代码指定终止条件、人类输入模式和工具执行逻辑,例如自动回复的最大次数。人们还可以注册编程的自动回复函数,用 Python 代码控制对话流,如图2中标识为“对话驱动的控制流”的代码块所示。
- 自然语言与编程语言之间的控制转换:AutoGen 还支持自然语言与编程语言之间的灵活控制转换。人们可以通过在自定义回复函数中调用包含特定控制逻辑的 LLM 推理,实现从代码到自然语言控制的转换;或者通过 LLM 建议的函数调用,实现从自然语言到代码控制的转换。
Composable conversation patterns. The conversation programming paradigm enables the composition of multi-agent conversations with diverse patterns, both statically and dynamically. For enhanced usability, we provide interfaces for constructing several commonly used conversation patterns, including two-agent chat, sequential chats, nested chat and group chat. We provide the detailed interfaces for specifying these patterns in Appendix D. Beyond these built-in patterns, one can employ these higher-level interfaces – and the low-level interfaces such as register_reply if necessary – recursively to compose more complex and creative patterns, e.g., a nested chat with a group chat nested within, allowing one agent to create its inner monologue, realizing the Society of Mind idea from Minsky (1988). The composed conversation workflow can be static or dynamic. AutoGen provides a few general ways to achieve dynamic conversation flows: 1) custom reply functions and triggers. Nested chat and group chat are examples of conversation patterns using built-in custom reply functions. In nested chat, one agent can hold the current conversation while invoking conversations with other agents depending on the content of the current message and context. In group chat, one can define the speaker transition conditions based on the current conversation status. 2) LLM-driven function calls, in which a language model decides whether or not to call a particular function depending on the conversation status.
可组合的对话模式。 对话编程范式支持组合具有多种模式的多智能体对话,既可以是静态的,也可以是动态的。为了增强可用性,我们提供了用于构建几种常用对话模式的接口,包括 双智能体聊天、顺序聊天、嵌套聊天 和 群组聊天。我们在附录 D 中提供了指定这些模式的详细接口。除了这些内置模式之外,人们可以递归地使用这些更高级别的接口——以及必要时使用诸如 register_reply 之类的低级接口——来组合更复杂和更有创意的模式,例如,嵌套聊天中再嵌套一个群组聊天,允许一个智能体创建其内心独白,实现 Minsky (1988) 的“思维社会”思想。组合后的对话工作流可以是静态的或动态的。AutoGen 提供了几种实现动态对话流的通用方法:1) 自定义回复函数和触发器。嵌套聊天和群组聊天是使用内置自定义回复函数的对话模式示例。在嵌套聊天中,一个智能体可以暂停当前对话,同时根据当前消息内容和上下文调用与其他智能体的对话。在群组聊天中,可以根据当前对话状态定义发言者转换条件。2) LLM 驱动的函数调用,其中语言模型根据对话状态决定是否调用特定函数。
Discussion
We introduced an open-source library, AutoGen, that incorporates paradigms of conversable agents and conversation programming. AutoGen also provides various additional supports, including multimodality, asynchronous operations, and enhanced LLM inference. Furthermore, AutoGen seamlessly interoperates with numerous single-agent systems, LLM tools, and libraries, such as OpenAI Assistant and MemGPT (Packer et al., 2023). Although still in an early stage, AutoGen is already benefiting a wide range of vertical industries and empowering researchers to build multi-agent AI systems for various scientific studies. For example, AutoGen has been used to realize a multi-agent system for accessing task utility in LLM-powered applications (Arabzadeh et al., 2024). AutoGen has been used in studying behaviors of embodied agents in organized teams (Guo et al., 2024). AutoGen is used for producing synthetic dataset for language model fine tuning (Mitra et al., 2024), or in RL environments to train LLMs for agents (Zhou et al., 2024). AutoGen is also used in diverse science and engineering domains such as mechanics (Ni & Buehler, 2023), protein discovery (Ghafarollahi & Buehler, 2024a), and material design (Ghafarollahi & Buehler, 2024b).
我们介绍了一个开源库 AutoGen,它融合了可对话智能体和对话编程的范式。AutoGen 还提供了各种附加支持,包括多模态、异步操作和增强的 LLM 推理。此外,AutoGen 与众多单智能体系统、LLM 工具和库(如 OpenAI Assistant 和 MemGPT)无缝协作。尽管仍处于早期阶段,AutoGen 已经惠及广泛的垂直行业,并赋能研究人员为各种科学研究构建多智能体 AI 系统。例如,AutoGen 已被用于实现一个多智能体系统,以评估 LLM 驱动应用中的任务效用。AutoGen 已被用于研究组织化团队中具身智能体的行为。AutoGen 还被用于生成用于语言模型微调的合成数据集,或在强化学习环境中训练用于智能体的 LLM。AutoGen 还被应用于多样化的科学和工程领域,如力学、蛋白质发现和材料设计。
AutoGen also paves the way for numerous future directions and research opportunities. For instance, it is worth investigating which strategies, such as agent topology and conversation patterns, lead to the most effective multi-agent conversations while optimizing the overall efficiency, among other factors. While increasing the number of agents and other degrees of freedom presents opportunities for tackling more complex problems, it may also introduce new safety challenges that require additional studies and careful consideration. We consider it important future work to explore those safety implications. We provide an expanded discussion in Appendix C, including guidelines for using AutoGen and future work. We welcome contributions from the broader community.
AutoGen 还为众多未来方向和研究机会铺平了道路。例如,值得研究哪些策略(如智能体拓扑结构和对话模式)能在优化整体效率等因素的同时,带来最有效的多智能体对话。虽然增加智能体数量和其他自由度为解决更复杂问题提供了机会,但也可能引入新的安全挑战,需要额外的研究和仔细考量。我们认为探索这些安全影响是重要的未来工作。我们在附录 C 中提供了扩展讨论,包括使用 AutoGen 的指南和未来工作。我们欢迎更广泛社区的贡献。