![Meta logo](https://i0.wp.com/musically.com/wp-content/uploads/2021/10/Meta-logo-2.jpg?fit=1200%2C400&ssl=1)
Another day, another story on the ethically-trained nature of generative AI (genAI) music models. Last year, Meta released two text-to-music genAI models, MusicGen and AudioCraft – both trained on licensed music – apparently from stock music libraries and other similar sources.
Meta has now released five new AI models to the public – all dealing with familiar things like image-to-text and image generation. But there’s also a new audio generation model, Jasco, which Meta says “is comparable to the evaluated baselines considering generation quality.”
The new model allows multiple inputs: both text description and snippets of music can be entered, and this, Meta says, means “significantly better and more versatile controls over the generated music.”
So: is what it makes any good? Well, there are some audio clips on the webpage hosting the research paper, and they include an audio prompt of Ravel’s Balero that the AI turns into a “driving 80s pop song” and also an accordion folk song, with fairly convincing results. There’s also an audio clip of someone beatboxing which the AI turns into a reggae song, a rock song, and so on.
It’s all as seemingly impressive as you might expect by now – but what does Meta want with a genAI music creation platform? It’s easy to imagine how a music-generating button could fit into the suite of Instagram editing tools to add personalised music to a video, and then of course, there’s the metaverse itself, which will need (possibly reactive) music to fit in with digital experiences.
Interestingly, one of the five new models is designed to detect AI-generated speech – possibly designed for spotting fake propaganda posts, but perhaps similar technology could be used to spot unauthorised use of AI-vocals.
Meanwhile, another genAI music platform has launched: Jen, which is an “ethically-trained generative AI music platform” with 40+ fully-licensed catalogues in its initial training set. It’s co-founded by Shara Senderoff (once of Raised in Space, which she co-founded with Scooter Braun). Offering the now-familiar set of text-to-music inputs, Jen is also hoping that its “strict training doctrine” will set it apart, with a “commitment to transparency, compensation and copyright identification.”
So how does that part work? Brace yourself for a blast from the recent past: the blockchain. Every track in the training set – plus the tracks created by the platform – are automatically vetted for audio recognition and copyright identification. Then, a cryptographic hash is generated for each track, which is recorded a blockchain. This latter part aims to set in stone the data around the moment of creation, and also ties each track to its creator. Tracks created can then be sold on a marketplace layer in the platform.