NVIDIA has unveiled the Foundational Generative Audio Transformer Opus 1, or Fugatto, a new artificial intelligence model that can create or modify any mixture of music, voices, and sounds. The company calls this model a Swiss knife for sound.
In a blog post, the company describes the capabilities of the new model. In particular, Fugatto will be able to create a musical fragment based on a text request, remove or add instruments from an existing song, change the accent or emotion in the voice, or even "create sounds that have never been heard before."
According to NVIDIA, Fugatto is the first generative AI model to demonstrate emergent properties - capabilities that result from the interaction of different trained abilities and the ability to combine instructions in a free-form manner. In particular, the model can generate a trumpet bark or a saxophone meow.
In addition, the company notes that the new model is capable of performing tasks for which it has not been trained. For example, with certain settings and a small amount of singing data, Fugatto can generate high-quality vocals.