Debate over “open source AI” term brings new push to formalize definition
🌈 Abstract
The article discusses the Open Source Initiative's (OSI) efforts to define "open source AI" and establish clear criteria for what constitutes truly open source AI systems. It highlights the ambiguity around the use of the "open source" label by companies like Meta, which release AI models with certain usage restrictions. The OSI's draft definition aims to provide a benchmark for evaluating AI systems based on four fundamental freedoms: permission to use for any purpose, ability to study how it works, freedom to modify, and permission to share with or without modifications.
🙋 Q&A
[01] The OSI's Efforts to Define "Open Source AI"
1. What are the key elements of the OSI's draft definition for "open source AI"?
- The draft definition emphasizes four fundamental freedoms: permission to use the AI system for any purpose, ability to study how it works, freedom to modify it for any purpose, and permission to share it with or without modifications.
- The definition extends beyond just the AI model or its weights, encompassing the entire system and its components, including detailed information about the training data, full source code, and model weights and parameters.
- The draft does not mandate the release of raw training data, but instead requires "data information" - detailed metadata about the training data and methods.
2. What is the goal of the OSI in establishing a clear definition for "open source AI"?
- The OSI aims to provide a benchmark against which AI systems can be evaluated, helping developers, researchers, and users make more informed decisions about the AI tools they create, study, or use.
- The organization believes that truly open source AI will shed light on potential software vulnerabilities of AI systems, as researchers will be able to see how the AI models work behind the scenes.
3. What is the timeline for the OSI's "open source AI" definition?
- The OSI indicates that a stable version of the "open source AI" definition is expected to be announced in October at the All Things Open 2024 event in Raleigh, North Carolina.
[02] Ambiguity in the Use of "Open Source" for AI
1. What are some examples of AI models or systems that are not truly open source, despite being labeled as such?
- Meta's Llama 3 model, while freely available, does not meet the traditional open source criteria as defined by the OSI for software because it imposes license restrictions on usage due to company size or what type of content is produced with the model.
- The AI image generator Flux is another "open" model that is not truly open source.
2. How have researchers and advocates typically described AI models with restrictions or lack of accompanying training data?
- They have used alternative terms like "open-weights" or "source-available" to describe AI models that include code or weights with restrictions or lack accompanying training data.
3. What are the concerns raised by free-software advocates about the ambiguous use of the "open source" label for AI systems?
- There have been intense debates among free-software advocates about what truly constitutes "open source" in the context of AI, as some companies release trained AI language model weights and code with usage restrictions while using the "open source" label.