magic starSummarize by Aili

I stumbled upon LLM Kryptonite and no one wants to fix it

๐ŸŒˆ Abstract

The article discusses the issues with the current state of large language models (LLMs) and the lack of proper support and maintenance processes from the companies developing these models. The author highlights how a simple prompt was able to break nearly every LLM they tested, indicating a fundamental flaw in the models. However, the author faced difficulties in getting the LLM vendors to acknowledge and address this issue.

๐Ÿ™‹ Q&A

[01] Feature

1. What is the key issue the author highlights with the current state of large language models (LLMs)?

  • The author highlights that LLMs, which are being widely adopted and integrated into various applications, are fundamentally unstable and untested, with the potential to crash at any moment without explanation.
  • The author notes that no self-respecting IT department would have anything to do with such an untested and unstable technology, and would only allow it in a "sandbox" environment, ready to shut it down at the first sign of trouble.
  • However, the author points out that the whole world has embraced this untested and unstable technology, integrating it into billions of devices, without any proper systems or processes in place to address the inevitable problems that arise.

2. What does the author mean by the "age of 'mid'" when it comes to AI-driven classifiers?

  • The author explains that the current state of AI-driven classifiers is in the "age of 'mid'", meaning they are not great, but not horrid either. These classifiers are able to perform "well enough", better than any algorithm, but not as effectively as a human being.

3. What happened when the author tried to test a prompt for an AI-based classifier?

  • The author tried to test a simple prompt for an AI-based classifier on various LLM-powered chatbots, including Microsoft Copilot, Gemini, Claude, ChatGPT+, LLamA 3, Meta AI, Mistral, and Mixtral.
  • To the author's surprise, the prompt caused all of these chatbots, except for Anthropic's Claude 3 Sonnet, to descend into "babble-like madness" that went on endlessly.

4. What did the author do after discovering this issue?

  • The author contacted Microsoft's feedback system for Copilot and the support page for Groq, one of the LLM-as-a-service providers, to report the issue.
  • The author received confirmation from Groq that they were able to replicate the issue across the LLMs they support, indicating that this was not just a problem with a single implementation, but a more fundamental flaw.

[02] Lack of Proper Support and Maintenance Processes

1. What concerns does the author raise about the lack of proper support and maintenance processes from LLM vendors?

  • The author notes that despite the vast and shadowy dark-web market for prompt attacks that could potentially exploit vulnerabilities in LLMs, the author faced difficulties in getting the LLM vendors to acknowledge and address the issue they had discovered.
  • The author tried to contact various LLM vendors directly, but received little to no response, suggesting that these companies lack proper channels for customers to report issues and have them addressed.
  • The author compares this to the software industry's standard practices of having a QA department to replicate and document customer bugs, and then prioritize and resolve them, which the author argues is necessary for the progression of any software product.

2. What does the author suggest about the responsibility of LLM vendors?

  • The author argues that until LLM vendors close the loop between vendor and client, their powerful products cannot be considered safe. The author states that with great power comes great responsibility, and for AI companies to be seen as responsible, they need to listen closely to their customers, judge wisely, and act quickly to address issues.
Shared by Daniel Chen ยท
ยฉ 2024 NewMotor Inc.