
Runway Ripped Off YouTube Creators

๐ Abstract
The article discusses how the AI video generation tool Gen-3, developed by the company Runway, was secretly trained using thousands of videos scraped from popular YouTube creators, brands, and even pirated content. The article provides details on the internal spreadsheet that contained the list of training data, as well as examples of how the model was able to generate videos in the style of specific YouTubers. The article also touches on the broader issue of AI companies using copyrighted material to train their models without permission.
๐ Q&A
[01] Runway's Training Data
1. What kind of content was included in Runway's training data for the Gen-3 model?
- The training data included videos from the YouTube channels of various media and entertainment companies, such as The New Yorker, VICE News, Pixar, Disney, Netflix, Sony, and others.
- It also included videos from popular influencers and content creators, such as Casey Neistat, Sam Kolder, Benjamin Hardman, Marques Brownlee, and many others.
- In addition, the training data included links to pirated content, such as anime, Studio Ghibli films, and other copyrighted material.
2. How did Runway obtain the training data?
- According to a former Runway employee, there was a company-wide effort to compile the videos into spreadsheets to serve as training data.
- Runway then used open-source software, specifically YouTube-DL, to scrape the videos from YouTube, using proxies to avoid getting blocked by Google.
3. What was the purpose of the different spreadsheets and categorization of the training data?
- The spreadsheets were used to organize the training data based on different criteria, such as subject matter, camera work, and diversity of people.
- For example, there were sheets for "Cinematic Masterpieces," "High Camera Movement," and "Single great videos (for finetuning)," among others.
- The goal was to obtain videos that had specific characteristics that Runway wanted to incorporate into the Gen-3 model.
[02] Impact on the AI Industry
1. How has the revelation of Runway's training data practices impacted the AI industry?
- The article mentions that the use of copyrighted material to train AI models has been a recurring issue in the industry, with companies like OpenAI and Google also facing criticism for similar practices.
- The article cites an open letter signed by over 200 musicians asking tech companies to stop infringing on the rights of artists when developing AI.
- The article also mentions that YouTube's CEO has stated that the use of the platform's content without permission is a clear violation of the terms of service.
2. How have other AI companies responded to the allegations of using copyrighted material for training?
- The article mentions that OpenAI's CTO, Mira Murati, told the Wall Street Journal that she didn't know if the training data for their text-to-image generator, Sora, included videos from YouTube, Instagram, and Facebook.
- The article also notes that Google, which is an investor in Runway, pointed to a previous statement where the company said that using YouTube videos to train AI models would violate YouTube's rules.