AI4D African Languages Lab
Empowering African Languages through Data Science and AI
Hosted at Data Science for Social Impact (DSFSI), African Institute of Data Science and Artificial Intelligence, University of Pretoria.
Vision & Impact
The AI4D African Languages Lab is dedicated to building human capacity and creating digital innovations that contribute to Africa's digital transformation. In our first year, we have bridged the gap between technical AI and local linguistic contexts through cross-disciplinary collaborations in law, humanities, and education.
Core Research Directions
Our work addresses the unique complexities of low-resource African languages through four strategic pillars:
- Foundational Language Models: Innovating context-aware word embeddings and mathematical models specifically designed for the richness of African scripts and tonality.
- Responsible AI & Governance: Prioritizing ethical data curation, equitable licensing, and frameworks that preserve culture while reducing AI hallucinations.
- High-Impact Applications: Developing RAG (Retrieval-Augmented Generation) systems for critical sectors like healthcare, agriculture, and sustainable development.
- Inclusive Technology: Advancing speech recognition for children and South African Sign Language (SASL) to bridge the digital divide for under-represented communities.
Major Milestones
- Establishment of AfriDSAI: Support from the AI4D program was instrumental in launching the new African Institute for Data Science and Artificial Intelligence (AfriDSAI) in early 2025.
- AfroCS-XS Project: Completion of a human-validated synthetic code-switched dataset for four African languages.
- 3,000 Hours of Speech: Ongoing development of a massive speech dataset for seven South African languages.