magic starSummarize by Aili

A new bill wants to reveal what’s really inside AI training data

🌈 Abstract

The article discusses a new bill proposed by Rep. Adam Schiff (D-CA) called the Generative AI Copyright Disclosure bill, which would require tech companies to disclose any copyrighted materials used to train their AI models.

🙋 Q&A

[01] The Generative AI Copyright Disclosure bill

1. What are the key requirements of the Generative AI Copyright Disclosure bill?

  • The bill would require anyone making a training dataset for AI to submit reports on its contents to the Copyrights Register, including a detailed summary of the copyrighted material in the dataset and the URL for the dataset if it's publicly available.
  • Companies must submit a report "not later than 30 days" before the AI model that used the training dataset is released to the public.
  • The bill will not be retroactive to existing AI platforms unless changes are made to their training datasets after it becomes law.

2. What is the purpose of the bill? The bill aims to address the issue of AI models being trained on copyrighted material without permission, which artists, authors, and other creators have been complaining about since the rise of generative AI.

3. What is the stance of different industry groups on the bill?

  • The bill has garnered support from industry groups like the Writers Guild of America (WGA), the Recording Industry Association of America (RIAA), the Directors Guild of America (DGA), the Screen Actors Guild - American Federation of Television and Radio Artists (SAG-AFTRA), and the Authors Guild.
  • The Motion Picture Association (MPA), which normally backs moves to protect copyrighted work from piracy, is notably absent from the list of supporters.

4. What other efforts have been made to bring more transparency to training datasets? The group Fairly Trained wants to add labels to AI models if they prove they asked for permission to use copyrighted data.

[02] Copyrights and AI

1. What is the ongoing debate around copyrights and AI?

  • Copyright and AI have always been tricky to navigate, especially as the question of how much AI models change or mimic protected content has not been settled.
  • Artists and authors have turned to lawsuits to assert their rights, while developers of AI models claim their models are trained on publicly available data, but the sheer amount of information means they don't know specifically which data is copyrighted.
  • Companies have said any copyrighted materials fall under fair use, and some have begun offering legal cover to some customers if they find themselves sued for copyright infringement.
Shared by Daniel Chen ·
© 2024 NewMotor Inc.