Summarize by Aili

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI

https://www.404media.co/nvidia-ai-scraping-foundational-model-cosmos-project/?utm_campaign=article_email&utm_content=article-13348&utm_medium=email&utm_source=sg

🌈 Abstract

The article discusses how Nvidia, a technology company, used copyrighted content from sources like YouTube and Netflix to train its AI models, despite potential legal and ethical concerns raised by some employees.

🙋 Q&A

[01] Nvidia's Use of Copyrighted Content

1. What did the internal Slack chats, emails, and documents obtained by 404 Media show about Nvidia's use of copyrighted content?

The internal documents show that Nvidia scraped videos from YouTube and several other sources to compile training data for its AI products.
When employees raised questions about the potential legal issues surrounding the use of datasets compiled by academics for research purposes and YouTube videos, managers told them they had clearance to use that content from the highest levels of the company.

2. How did Nvidia defend its practice of using copyrighted content to train its AI models?

Nvidia defended its practice as being "in full compliance with the letter and the spirit of copyright law."

3. What project was the copyrighted content used for internally at Nvidia?

The former Nvidia employee said that employees were asked to scrape videos from Netflix, YouTube, and other sources to train an AI model for Nvidia's Omniverse 3D world generator, self-driving car systems, and "digital human" products. The project was internally named Cosmos (but different from the company's existing Cosmos deep learning product) and has not yet been released to the public.

Shared by Daniel Chen ·

Install fromChrome Web Store