12 Aug 2025

Bridging the AI Divide: DSFSI Drives Multilingual Knowledge Access at UP's 'Abstracts into Indigenous Voices' Event

The University of Pretoria recently hosted a pivotal event, Abstracts into Indigenous Voices, on August 12, 2025.

The University of Pretoria recently hosted a pivotal event, “Abstracts into Indigenous Voices,” on August 12, 2025. This landmark gathering marked the launch of a pilot project designed to revolutionize access to university knowledge through human and AI-driven abstract translation. The Data Science for Social Impact (DSFSI) Lab proudly sponsored and contributed significantly to the machine translation and project management aspects, aiming to bridge the widening digital and AI divide. The initiative underscores a collective commitment to promoting multilingualism, ensuring equitable access to academic research, and decolonising knowledge within higher education and beyond.

Listen/Watch a breakdown of the event generated by Notebook LM

Professor Vukosi Marivate: A Call to Generosity and Action

Professor Vukosi Marivate, Head of the Data Science for Social Impact (DSFSI) Lab, delivered a powerful closing reflection, distilling the core philosophy driving the project.

“We’re generous because we’re putting the foundations of future generations, imagining a future that we may not like to see, but it’s worth fighting for.”

Prof. Marivate emphasized that the “Abstracts into Indigenous Voices” project embodies DSFSI’s commitment to tackling real-world problems through data science, rather than merely focusing on technical challenges. He highlighted the lab’s significant investment, including utilizing publication subsidies, to bring this vision to fruition, despite systemic resistance and a prevailing “digital divide” where critical linguistic resources face “digital language death” due to lack of open access and maintenance. His passionate call to “collaborate ruthlessly” across institutions challenges traditional academic silos, advocating for open data sharing to prevent the loss of invaluable resources like offline term banks. Prof. Marivate envisions a future where universities actively invest in indigenous languages, building taller ladders to overcome entrenched obstacles and empower future generations.

Dr. Idris Abdulmumin: The Machine Learning Frontier

Dr. Idris Abdulmumin, a Postdoctoral Fellow at DSFSI, shared crucial insights into the machine learning aspects of the translation project. His presentation unveiled the current capabilities and significant challenges in developing AI models for underrepresented languages, particularly in the academic domain, where accuracy must encompass not only language but also the precise conveyance of domain-specific facts and concepts.

“The models’ performances for isiZulu in the domains of study are all below 0.1 BLEU points, compared to over 0.6 points for Afrikaans. This indicates that, at the word level, the model’s ability to produce accurate academic translations in isiZulu is extremely limited.”

Dr. Abdulmumin’s demo starkly illustrated the “digital scarcity” confronting African languages, particularly in technical domains such as science. While languages like Afrikaans achieve strong machine translation performance due to abundant digital resources, others, such as isiZulu and Sepedi, lag significantly in accuracy. He emphasized the urgent need for domain-specific datasets in areas like mathematics and sociology, moving beyond general news content, to train robust language models. The project’s goal of generating high-quality parallel data through human-AI collaboration represents a critical step toward addressing vocabulary gaps and enhancing automated evaluation mechanisms, ultimately advancing the development of more inclusive AI.

Access the Demo here https://huggingface.co/spaces/dsfsi/UPTranslate

Watch Professor Tshilidzi Marwala’s Address to Mark the Occation.

Dr. Helena Kruger-Roux & Dr. Elsabe Taljard: Bridging the Human-AI Gap

Dr. Helena Kruger-Roux and Professor Elsabé Taljard from the Faculty of Humanities provided a vital perspective on the human translation process and its intersection with AI. Their work highlighted the stark “digital divide” within South African languages themselves, contrasting Afrikaans with Sepedi.

Prof. Taljard eloquently stated the fundamental truth:

“Who controls the vocabulary controls knowledge.”

Their practical examples showed that for Afrikaans, machine translation (MT) provides a “near-perfect result,” requiring minimal post-editing (5-10% human effort) to achieve professional quality. This is due to Afrikaans’ “digital abundance” and developed terminology. However, for Sepedi, MT results are often “nowhere close to making sense,” demanding translators to spend “2 to 3 hours” building translations from scratch. This “double burden” necessitates not just translation, but the creation of missing terminology and its digital discoverability. They posed a critical question: how can AI assist in streamlining terminology development to ensure accuracy and consistency, transforming the painstaking process of “building a house while manufacturing the bricks”?

Professor Chijioke Okorie: Policy as Enabler and Guardrail

Professor Chijioke Okorie, Principal Investigator of the Data Science Law Lab, illuminated the critical role of law and policy in fostering or hindering multilingualism and open knowledge.

“So, every time we talk about, you know, knowledge governance, every time we talk about openness in terms of sharing knowledge, we must also then ask ourselves: open for who?”

Prof. Okorie highlighted how existing policies, like the university’s intellectual property rights over theses, enable projects like abstract translation by simplifying permission processes. However, she warned against “protectionism” in data sharing, which could stifle collaborative efforts vital for advancing indigenous language technologies. The rapid evolution of AI presents unique policy challenges, often outpacing regulatory frameworks. Her vision for the future emphasizes the creation of laws and policies informed by African contexts, ensuring “equitable knowledge governance.” This means Africa must be more than just data annotators; it must participate as “co-creators” of AI tools, building the necessary infrastructure to derive meaningful benefit from open data and prevent new forms of digital exclusion.

A Unified Vision for Epistemic Justice and Digital Inclusion

The “Abstracts into Indigenous Voices” event transcended a mere project launch, evolving into a powerful dialogue on the future of knowledge accessibility in Africa. Speakers from across disciplines — including Dr. Naledi Mbude-Mehana (DBE) who passionately spoke of confronting “epistemic violence” in basic education, and Ms. Ntsiki Loteni (Transformation Office) who emphasized fostering affirming spaces for all languages — underscored the urgent need for systemic change.

Ms. Anna Masemola and Mr. Isak van der Walt from Library Services articulated a future where university knowledge is not only openly accessible but seamlessly discoverable in any chosen indigenous language, through intuitive interfaces and advanced AI tools, extending even to metaverse classrooms. Dr. Brenda Nomadlozi Bokaba, a veteran language activist, powerfully reinforced the cultural and intellectual richness of African languages, demonstrating how true multilingualism enhances academic rigor.

This collaborative spirit, championed by DSFSI, aims to move beyond merely translating content to actively building the digital infrastructure and terminology required for African languages to thrive in the age of AI. The project is a tangible step towards ensuring that the technologies of tomorrow do not replicate the inequalities of today, but rather serve as instruments for broad-based social impact and equitable knowledge empowerment for all. It represents a call to action for collective generosity, sustained investment, and ruthless collaboration to ensure that African languages and their speakers are at the forefront of the global digital future.

Acknowledgements

We are thankful for the support of:

  • AI4D African Languages Lab
  • African Institute for Data Science and Artificial Intelligence
  • Data Science for Social Impact
  • Department of African Languages
  • Department of Afrikaans
  • Department of Computer Science
  • Department of Library Services
  • Faculty of Engineering, Built Environment and Information Technology
  • Faculty of Humanities
  • The Javett Art Centre at the University of Pretoria (Javett-UP)
  • School of the Arts: Department of Music Choir
  • Transformation Office
  • UP Language Development Plan Unit
  • UP languages policy
  • Unit for Academic Literacy