Artificial Intelligence

OpenAI Jukebox – The AI That Generates Songs

Technology is increasingly being used to regulate it as time goes on, and individuals are likewise seeking to become more skilled and active. AI is now making inroads into the realm of musical composition.

Jukebox has been made available by Elon Musk’s and other tech billionaires’ artificial intelligence research company OpenAI. It is an open-source neural network that has the ability to create complete songs by itself.

A few years back, OpenAI unveiled MuseNet, a deep neural network that can compose music for four minutes using ten different instruments and a variety of genres, including country, Mozart, and the Beatles.

More specifically, it can create songs from a wide range of musical genres, including hip-hop, jazz, and rock, and it can also pick up the melody, rhythm, and lengthy compositions for a wider range of instruments, as well as how the singers should move and sound in relation to the music.

Jukebox’s autoencoder model employs a technique called Vector Quantized Variational AutoEncoder (VQ-VAE) to convert audio. 44kHz raw audio is compressed in almost 8, 32, and 128 times by three levels of VQ-VAE.

On the other hand, the ground-level encoding (8 times) creates the great quality reformation in the style of “musical codes” while maintaining only the essential musical information, such as pitch, loudness, and timbre.

A high-level transformer model was developed based on the work of estimating squeezed audio tokens, allowing Jukebox to gain superior attributes of any music style, to mould Jukebox on particular genres and performers.

However, OpenAI has created an encoder that appends a query-using layer from the Jukebox music layer to the lyrics encoder in order to acquire keys and values that further enable Jukebox to obtain the proper sequence of lyrics and music. This encoder provides a framework with more lyrical material.

Jukebox models require a significant amount of calculation and training time;

  1. Over the course of three days, 256 Nvidia V100 graphics cards were used to train the VQ-VAE, which had around 2 million variables.
  2. The upsamplers, which included more than 1 billion variables, underwent a two-week training period on 128 Nvidia V100 graphics cards.
  3. Over the course of four weeks, 512 Nvidia V100 graphics cards were used to train the top-level prior, which encompassed 5 billion variables.

OpenAI Jukebox is the generational extension of OpenAI’s earlier “MuseNet” project, which discovered integrated music based on a significant amount of MIDI data.

Jukebox models can also be used to gain control over the overall diversity and structure while minimising long-, medium-, and short-term mistakes.

The vast music datasets from every genre that are appropriate are used to train the jukebox AI. Now that AI can create songs that are largely identical to the musicians they were trained on (in many situations), Jukebox investigates how it might imitate the style and genre of music. Additionally, it makes an effort to mimic the vocal style of specific vocalists.

The OpenAI team specifically chose AI to recreate music, and they use raw audio to teach Jukebox;

  1. First, researchers applied convolutional neural networks, that are useful machine learning algorithms specifically favourable at recognising images and language patterns, to cypher and compress raw audio with the 1.2 million songs and their related metadata. Every song’s metadata included details including the genre, artist, album, and any relevant playlist keywords.
  2. Next, they used a transformer to create fresh, compressed audio, which was then upsampled back into its original form.

Earlier models such as rule-based systems, chaos, and self-similarity, constraint programming, etc., have significantly created music in the form of piano-roll, which mainly laid out pitch, velocity, timing, and instrument of every single note to be performed. Generative models have been adaptive in producing music. By focusing on issues in low-dimensional space, this symbolic method simplifies the modelling effort.

Researchers have used non-symbolic techniques to make music that can be understood simply as an audio file in parallel to this. A non-symbolic approach is rather difficult because the space of raw audio is exceedingly high-dimensional and contains a great amount of information.

DeepBach, CoCoNet, and MueGAN, which use generative adversarial networks, Gibb sampling, and hierarchical recurrent networks, respectively, to generate notes in the style of Bach chorals, are examples of recent data-driven approaches in the context of a general published approach.

Since a few years ago, OpenAI has been involved in the creation of AI-music and has already created “MuseNet” to produce full-length MIDI songs with tones and compositions. Additionally, the recently released Jukebox was similarly trained to MuseNet but went one step further by generating vocal parts and lyrics across a broad spectrum of genres.

Additionally, OpenAI has created a website that allows anyone can browse all of the Jukebox sampling that has been produced. Compared to the thousand timestamps that OpenAI’s language generator GPT-2 uses to keep track of a piece of writing, Jukebox keeps records of millions of timestamps every song.




Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker