magic starSummarize by Aili

Amazon Investigates Perplexity AI Over Potential Data-Scraping Violations

๐ŸŒˆ Abstract

The article discusses Amazon Web Services' (AWS) investigation into Perplexity AI over its alleged data-scraping practices, where the AI startup is accused of scraping web archives from news outlets like Forbes and Wired to train its models without consent or compensation. The article also covers the broader issue of tech firms' attitudes towards using news sites and other web content for their AI models, and the ongoing backlash from some media outlets.

๐Ÿ™‹ Q&A

[01] AWS's Investigation into Perplexity AI

1. What is AWS investigating Perplexity AI for?

  • AWS is investigating Perplexity AI over its data-scraping practices, where the AI startup is accused of scraping web archives from news outlets like Forbes and Wired to train its models without consent or compensation.

2. What did the AWS representative say about their terms of service and customer responsibilities?

  • The AWS representative confirmed that all AWS clients must follow the robots.txt file instructions, which are typically used by websites to ask bots and web crawlers not to scrape their data. The AWS terms of service prohibit abusive and illegal activities, and customers are responsible for complying with those terms.

3. What is Perplexity's response regarding the robots.txt issue?

  • Perplexity claims that its PerplexityBot, which runs on AWS, respects the robots.txt standard and that Perplexity-controlled services are not crawling in any way that violates AWS Terms of Service. However, other reports have accused Perplexity of ignoring the robots.txt standard.

[02] Backlash from News Outlets

1. What is the issue Forbes has with Perplexity's AI-generated news articles?

  • Forbes accused Perplexity of "cynical theft" and creating "knockoff stories" using "eerily similar wording" and "entirely lifted fragments" from its articles, without adequate citation or inclusion of Forbes' name.

2. How have other news outlets responded to tech firms' use of their content for AI models?

  • Some news outlets, like The New York Times, are suing OpenAI and Microsoft for copyright infringement by training their AI models on the outlets' articles without consent. Other outlets, like Semafor, TIME, and The Financial Times, have signed AI deals to proactively license their content.

3. What is the broader issue with tech firms' attitudes towards using news sites and other web content for their AI models?

  • There is an ongoing backlash from some media outlets against tech firms' practices of scraping and using their content for AI models without consent or compensation. This has sparked debates around the "social contract" and fair use of publicly available web content.
Shared by Daniel Chen ยท
ยฉ 2024 NewMotor Inc.