Swivuriso: ZA-African Next Voices

A large-scale multilingual speech dataset for 7 South African languages supporting ASR research and inclusive technologies.

About the Project
Swivuriso is a multilingual speech dataset targeting over 3000 hours of audio across 7 South African languages. It supports Automatic Speech Recognition (ASR) and inclusive speech technologies, combining both scripted and unscripted speech.

Project Team

Vukosi Marivate (co-PI)Kayode Olaleye (co-PI)Sitwala MundiaUnarine NetshifhefheNia Zion van WykMahmooda MilanzieTsholofelo MogaleChijioke OkorieThapelo SindaneAndinda BakainagaGraham MorrisseyDale DunbarFranscois SmitTsosheletso ChidiRooweither MabuyaAndiswa BukulaRespect MlamboTebogo Macucwa

Partners & Supporters

Way With WordsUP LawMetaGates Foundation

Dataset Statistics

[Statistics & Graphs will go here]