Skip to content

Simple and Controllable Music Generation

https://huggingface.co/papers/2306.05284

https://arxiv.org/abs/2306.05284

https://github.com/facebookresearch/audiocraft

Meta

简单可控的音乐生成

We tackle the task of conditional music generation. We introduce MUSICGEN, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MUSICGEN is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MUSICGEN can generate high-quality samples, both mono and stereo, while being conditioned on textual description or melodic features, allowing better controls over the generated output. We conduct extensive empirical evaluation, considering both automatic and human studies, showing the proposed approach is superior to the evaluated baselines on a standard text-to-music benchmark. Through ablation studies, we shed light over the importance of each of the components comprising MUSICGEN.

我们致力于解决条件音乐生成的任务。我们引入了 MUSICGEN,一个能够处理多流压缩离散音乐表示的单一语言模型。与先前的工作不同,MUSICGEN 由一个单阶段 Transformer 语言模型结合高效的词元交错模式组成,从而消除了级联多个模型的需求。遵循这种方法,我们展示了 MUSICGEN 如何能够生成高质量的单声道和立体声样本,同时基于文本描述或旋律特征进行条件控制,从而对生成输出实现更好的控制。我们进行了广泛的实证评估,包括自动评估和人工研究,表明所提出的方法在标准文本到音乐基准上优于评估的基线。通过消融研究,我们阐明了构成 MUSICGEN 的每个组件的重要性。


Fast Segment Anything

https://huggingface.co/papers/2306.12156

https://arxiv.org/abs/2306.12156

https://github.com/casia-iva-lab/fastsam


The recently proposed segment anything model (SAM) has made a significant influence in many computer vision tasks. It is becoming a foundation step for many high-level tasks, like image segmentation, image caption, and image editing. However, its huge computation costs prevent it from wider applications in industry scenarios. The computation mainly comes from the Transformer architecture at high-resolution inputs. In this paper, we propose a speed-up alternative method for this fundamental task with comparable performance. By reformulating the task as segments-generation and prompting, we find that a regular CNN detector with an instance segmentation branch can also accomplish this task well. Specifically, we convert this task to the well-studied instance segmentation task and directly train the existing instance segmentation method using only 1/50 of the SA-1B dataset published by SAM authors. With our method, we achieve a comparable performance with the SAM method at 50× higher run-time speed. We give sufficient experimental results to demonstrate its effectiveness.

近期提出的分割任意模型在许多计算机视觉任务中产生了重大影响,正成为图像分割、图像描述和图像编辑等高级任务的基础步骤。然而,其巨大的计算成本阻碍了其在工业场景中的广泛应用,计算量主要来自高分辨率输入下的Transformer架构。在本文中,我们为这一基础任务提出了一种性能相当的加速替代方法。通过将任务重新表述为片段生成和提示,我们发现一个带有实例分割分支的常规CNN检测器也能很好地完成此任务。具体来说,我们将此任务转化为已得到充分研究的实例分割任务,并仅使用SAM作者发布的SA-1B数据集的1/50直接训练现有的实例分割方法。使用我们的方法,我们在运行速度提升50倍的情况下取得了与SAM方法相当的性能。我们提供了充分的实验结果来证明其有效性。


WizardCoder: Empowering Code Large Language Models with Evol-Instruct

https://huggingface.co/papers/2306.08568

https://arxiv.org/abs/2306.08568

https://github.com/nlpxucan/WizardLM

Microsoft

WizardCoder:基于Evol-Instruct赋能代码大语言模型

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated remarkable performance in various code-related tasks. However, different from their counterparts in the general language modeling field, the technique of instruction fine-tuning remains relatively under-researched in this domain. In this paper, we present Code Evol-Instruct, a novel approach that adapts the Evol-Instruct method to the realm of code, enhancing Code LLMs to create novel models WizardCoder. Through comprehensive experiments on five prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, DS-1000, and MultiPL-E, our models showcase outstanding performance. They consistently outperform all other open-source Code LLMs by a significant margin. Remarkably, WizardCoder 15B even surpasses the well-known closed-source LLMs, including Anthropic's Claude and Google's Bard, on the HumanEval and HumanEval+ benchmarks. Additionally, WizardCoder 34B not only achieves a HumanEval score comparable to GPT3.5 (ChatGPT) but also surpasses it on the HumanEval+ benchmark. Furthermore, our preliminary exploration highlights the pivotal role of instruction complexity in achieving exceptional coding performance.

代码大语言模型已在各种代码相关任务中展现出卓越的性能。然而,与通用语言建模领域的同类模型不同,指令微调技术在该领域的研究仍相对不足。在本文中,我们提出了 Code Evol-Instruct,一种将 Evol-Instruct 方法应用于代码领域的新颖方法,通过增强代码大语言模型,创造了新模型 WizardCoder。通过在 HumanEval、HumanEval+、MBPP、DS-1000 和 MultiPL-E 这五个著名代码生成基准上的全面实验,我们的模型展示了出色的性能,持续且显著地优于所有其他开源代码大语言模型。值得注意的是,WizardCoder 15B 在 HumanEval 和 HumanEval+ 基准上甚至超越了知名的闭源大语言模型,包括 Anthropic 的 Claude 和 Google 的 Bard。此外,WizardCoder 34B 不仅取得了与 GPT3.5 相当的 HumanEval 分数,还在 HumanEval+ 基准上超越了它。更进一步,我们的初步探索强调了指令复杂性在实现卓越编码性能中的关键作用。