What the AI boom is getting wrong (and right), according to Hugging Face’s head of global policy
🌈 Abstract
The article discusses the role of the Hugging Face repository as a neutral ground in the competition between major AI companies, and the work of Hugging Face's head of global policy, Irene Solaiman, in advising regulators on issues related to AI development and deployment.
🙋 Q&A
[01] The Promise and Risk of Large Language Models
1. What are the key issues Irene Solaiman is advising regulators on regarding AI development?
- Bias assessment and mitigation
- Consent and data privacy issues
- Existential risks of AI
2. How does Solaiman view the challenges in addressing bias and consent in AI systems?
- Bias can be introduced at every stage of the AI development process, from the training data to the user interface
- Consent for using data is often ambiguous, with enforcement relying on a simple checkbox
- It is difficult to identify where biases are introduced in complex AI systems with multiple layers
3. What are Solaiman's views on the importance of data set research and representation in AI?
- Data set research needs to be prioritized and "glamorized" more
- Existing social norms and biases can be amplified through the data used to train AI systems
- Representation of diverse languages and scripts is crucial, but can be technically and financially challenging
[02] Language Models and Linguistic Diversity
1. What are the challenges in developing AI systems for non-Latin script languages?
- Non-Latin languages often have more complex scripts that are not well-supported by existing internet infrastructure
- Tokenization and processing of non-Latin languages can be more expensive
2. How is Hugging Face addressing issues of linguistic diversity and language model development?
- Hugging Face is working with the Indian government on an open-source Hinglish language model
- Hugging Face has launched evaluation leaderboards for languages like Arabic and Korean to drive progress
3. What are the complexities around language ownership and rights when developing language models?
- There are questions around who owns a language and who should benefit from its use in AI
- Some communities, like the Maori, have raised concerns about their language being used without consent
- The Indian government's Bhashini program is an example of efforts to fund and support Indic language data sets for AI