Why Generative AI Tools Are Better at Images and Speech Than Writing
๐ Abstract
The article discusses the differences in the quality of training data between visual art and writing, and how that affects the capabilities of generative AI models like DALL-E and ChatGPT. It explains why generative AI is better at producing images that resemble Rembrandt's paintings than generating high-quality written content.
๐ Q&A
[01] Differences in Training Data Quality
1. What are the key differences in the quality of training data between visual art and writing?
- The training data for visual art, such as Rembrandt's paintings, is of much higher quality on average compared to the training data for writing, which includes a lot of low-quality content from the internet.
- There are literally hundreds of thousands of masterpieces of visual art that were used to train tools like DALL-E, Midjourney, and Stable Diffusion.
- In contrast, the training data for writing includes a lot of "corporate nonsense", "wishy-washy, blame-avoidant" content, and other low-quality writing found online.
2. Why is the training data for writing generally of lower quality?
- Most people stop learning visual art techniques around the age of 18, but everyone writes online, so the average blog post or online article is not a masterpiece.
- The explosion of writing online, especially since the year 2000, has led to a proliferation of low-quality writing that has been absorbed into the training data for language models like ChatGPT.
3. How does the quality of training data affect the capabilities of generative AI models?
- Generative AI models like DALL-E are able to produce high-quality, photorealistic images because they were trained on a large corpus of high-quality visual art.
- In contrast, language models like ChatGPT are limited in their ability to generate high-quality, original written content because they were trained on a large amount of low-quality writing found online.
[02] Implications for Using Generative AI
1. What is the author's advice for using generative AI tools?
- It doesn't make sense to compare the writing capabilities of ChatGPT to the visual art capabilities of DALL-E, as they were trained on very different quality of data.
- Generative AI tools are more likely to produce impressive results for tasks like text-to-speech, speech-to-text, and generating realistic or anime-style imagery, compared to generating high-quality written content.
- The author suggests that using generative AI for writing tasks may not save as much time as using it for other tasks, due to the limitations of the training data.
2. Why is it easier to tell when something was written by a chatbot?
- The writing produced by chatbots tends to be "too long, too lengthy, too generic, too boring, too repetitive", which reflects the patterns of a lot of low-quality writing found online.
- This is in contrast to the high-quality, creative writing that would be needed to match the level of Rembrandt's paintings.
3. What is the key takeaway about the limitations of generative AI for writing?
- Writing is not inherently a harder problem to solve than representational art, but there are far fewer examples of world-class writing in the training data compared to world-class visual art.
- This means that generative AI tools like ChatGPT will have more success in tasks like generating images or using text-to-speech/speech-to-text, rather than producing high-quality original written content.