Swivuriso: ZA-African Next Voices

A large-scale multilingual speech dataset for 7 South African languages supporting ASR research and inclusive technologies.

Hugging Face GitHub @DSFSI_Research

About the Project

Swivuriso is unique in that it combines both scripted andunscripted speech, reflecting how people actually use language in daily life. All recordings are collected through ethical, community-centered processes, ensuring that participants are fairly engaged and that the data benefits the wider community. This approach strengthens both the quality of the dataset and its long-term impact.

The dataset covers the following seven languages, with the goal of building a balanced resource that reflects South Africa’s linguistic diversity:

isiZulu – 500h
isiXhosa – 500h
Sesotho – 500h
Xitsonga – 500h
Setswana – 500h
isiNdebele – 250h
Tshivenda – 250h

In total, the dataset will reach 3,000 hours of high-quality, multilingual audio. These recordings will form the foundation for robust ASR models, helping to break literacy barriers, make digital content locally relevant, and accelerate innovation in South African language technologies.

Dataset Paper (ArXiv: 2512.02201)

Project Team

Vukosi Marivate (co-PI)Kayode Olaleye (co-PI)Sitwala MundiaUnarine NetshifhefheNia Zion van WykMahmooda MilanzieTsholofelo MogaleChijioke OkorieThapelo SindaneAndinda BakainagaGraham MorrisseyDale DunbarFranscois SmitTsosheletso ChidiRooweither MabuyaAndiswa BukulaRespect MlamboTebogo MacucwaZainab AbdulrasaqKesego MokgosiFrancois SmitIdris AbdulmuminSeani Rananga

Partners & Supporters

Data Sources

Vukuzenzele Newspaper [Website][Data Repo], Wikipedia, African Wordnet, GrainSA, Agricultural Research Council, SADiLaR, Masakhane

Audio Samples

Listen to short clips from the dataset

Dataset Statistics

Project Overview

Current dataset statistics

Total Hours

3,016

Total Clips

483,191

Speakers

2,440

Languages

isiZulu

502.9h

59,115 clips

482 speakers

30.6s avg

isiXhosa

504.3h

73,665 clips

485 speakers

24.6s avg

Sesotho

503.6h

78,113 clips

480 speakers

23.2s avg

seTswana

502.2h

99,527 clips

487 speakers

18.2s avg

Xitsonga

500.1h

79,107 clips

198 speakers

22.8s avg

Tshivenda

250.9h

42,402 clips

104 speakers

21.3s avg

isiNdebele

251.9h

51,262 clips

104 speakers

17.7s avg

Audio Hours by Language

Total recording time collected per language

Language Metrics Comparison

Speakers vs Hours (bubble size = clips)

Speaker Demographics

Gender distribution across all languages

TotalBy Language

Age Distribution

Total speaker count by age range

TotalBy Language

Categories by Language

Total domain distribution

TotalBy Language

Citation: arXiv:2512.02201 [cs.CL] - Swivuriso: The South African Next Voices Multilingual Speech Dataset