Swivuriso: ZA-African Next Voices

A large-scale multilingual speech dataset for 7 South African languages supporting ASR research and inclusive technologies.

About the Project
Swivuriso is unique in that it combines both scripted andunscripted speech, reflecting how people actually use language in daily life. All recordings are collected through ethical, community-centered processes, ensuring that participants are fairly engaged and that the data benefits the wider community. This approach strengthens both the quality of the dataset and its long-term impact.

The dataset covers the following seven languages, with the goal of building a balanced resource that reflects South Africa’s linguistic diversity:
  • isiZulu – 500h
  • isiXhosa – 500h
  • Sesotho – 500h
  • Sepedi – 500h
  • Setswana – 500h
  • isiNdebele – 250h
  • Tshivenda – 250h


In total, the dataset will reach 3,000 hours of high-quality, multilingual audio. These recordings will form the foundation for robust ASR models, helping to break literacy barriers, make digital content locally relevant, and accelerate innovation in South African language technologies.

Project Team

Vukosi Marivate (co-PI)Kayode Olaleye (co-PI)Sitwala MundiaUnarine NetshifhefheNia Zion van WykMahmooda MilanzieTsholofelo MogaleChijioke OkorieThapelo SindaneAndinda BakainagaGraham MorrisseyDale DunbarFranscois SmitTsosheletso ChidiRooweither MabuyaAndiswa BukulaRespect MlamboTebogo Macucwa

Partners & Supporters

Way With WordsUP LawMetaGates Foundation

Data Sources

Vukuzenzele Newspaper [Website][Data Repo], Wikipedia, African Wordnet, GrainSA, Agricultural Research Council, SADiLaR, Masakhane

Audio Samples

Listen to short clips from the dataset

Sound waveform
Sound waveform
Sound waveform
Sound waveform

Dataset Statistics

Project Overview
Current dataset statistics

Total Hours

2,769

Total Clips

432,413

Speakers

2,242

Languages

7

isiZulu

502.9h
59,115 clips
482 speakers
30.6s avg

isiXhosa

504.3h
73,665 clips
480 speakers
24.6s avg

Sesotho

503.6h
78,113 clips
480 speakers
23.2s avg

seTswana

502.2h
99,527 clips
487 speakers
18.2s avg

Xitsonga

500.1h
79,107 clips
198 speakers
22.8s avg

Tshivenda

250.9h
42,402 clips
104 speakers
21.3s avg
Interactive Geographic Map
Number of speakers per province
Audio Hours by Language
Total recording time collected per language
Language Metrics Comparison
Speakers vs Hours (bubble size = clips)
Speaker Demographics
Gender distribution across all languages
Age Distribution
Speaker count by age range
Categories by Language
Domain distribution across all languages

Citation: TBC

Acknowledgments: Lelapa AI, Agricultural Research Council, Karya, Lanfrica, SADiLaR

DSFSI Logo