YouTube Presses play on AI music 🎤

Plus breaking down Books3

The future of music?

We recently told you about a rumor that YouTube and Universal Music Group (UMG) were in talks to establish a partnership that could open the door for sanctioned AI-generated tracks. Essentially, creators might eventually be free to make music featuring an AI Billie Eilish or Elton John, and then put it on YouTube without the fear of attack-dog lawyers going after them.

Now, it seems, that day is inching closer.

YouTube has announced its Music AI Incubator, of which UMG is an official partner. But what is a Music AI Incubator exactly? Good question.

YouTube CEO Neal Mohan has written a vague, corporate-speak blog post outlining the incubator, which is surprisingly thin on details. Nonetheless, Mohan does explain that the incubator will work and consult with a hand-picked set of artists, producers, and songwriters, and that the partnership will inform the development of tools for creators, while also “protecting music artists and the integrity of their work.” (Told you - corporate-speak.)

Mohan doesn’t provide any specifics of what AI-powered music tools the company is working on or when they might appear, but he does do some casual name-dropping to show who is consulting the incubator; among them are producer Rodney Jerkins (aka Darkchild), rapper Yo Gotti, and composer Max Richter are all involved, as well as the estate of ol’ blue eyes himself, Frank Sinatra. (AI Sinatra appearing on a trap record is going to be a thing, isn’t it.)

Why it matters:

As Mohan explains, YouTube creators and viewers are already deeply invested in AI, with people watching some 1.7 billion hours of AI-related content already this year. And, as we recently highlighted on our Twitter/X feed, there are already a stack of incredible AI-generated songs out there. So whether the music industry is ready or not, YouTube is putting itself right at the center of the next innovation in song creation.

Seeking out the syntax

It’s probably fair to say that, at this point, pretty much everybody knows how large language models (LLMs) such as OpenAI’s ChatGPT and Anthropic’s Claude work. Some people even understand that LLMs are trained on enormous volumes of data, made up of lots of human-written text. But how many people know exactly where that bank of words originated?

Broadly speaking, LLM operators don’t share the finer details of where they gather all the syntaxes they feed into their LLMs - a strategy that is perhaps more deliberate than ignorant.

Indeed, with companies such as OpenAI, Google, and Meta requiring countless cohesive paragraphs and sentences to teach machines how to write, the best place to go for much of it is previously published works. But that can open up a Pandora’s Box of copyright lawsuits, and so (from a company perspective) the less said the better.

But for all those with a hunch that their copyrighted works might have been used to teach machines how to write, programmer Alex Reisner has seemingly found a smoking gun.

Reisner has written a piece for The Atlantic (paywall) digging into Books3 - a vast dataset he says was used by Meta and others to train their LLMs. (Reisner does not say if OpenAI has used Books3.)

As Reisner explains it, Books3 contains the whole text of almost 200,000 books - 170,000 of which he was able to confidently identify. That batch includes titles from publishers such as HarperCollins and Penguin Random House, and books from writers such as Stephen King and comedian Sarah Silverman.

Those that have been following the AI buzz might realize that Silverman’s name is important here because she and two other authors recently filed lawsuits against Meta and ChatGPT-maker OpenAI, claiming their copyrighted works were illicitly used to train LLMs. So, simply put, if what Resiner says is correct, then Silverman might be about to kick Mark Zuckerberg’s butt. (Which is more than Elon Musk will be doing anytime soon.)

Why it matters:

The lawsuit that Silverman and her cohorts have filed against Meta might be something the company could settle quietly. But with 170,000 identifiable books in the dataset, billionaire Zuck might be dipping his billionaire hands into his billionaire pockets a lot to make this all go away.