Class Information
This module develops Natural Language Processing techniques for Data Science challenges. The course covers topics including misinformation detection, content moderation, and NLP for under-resourced languages. This course utilises a challenge-driven approach to learn about state-of-the-art natural language processing.
Course Structure
The course is organized into three blocks:
- Introduction to NLP - Traditional NLP foundations including tokenization, bag-of-words, TF-IDF, n-grams, text classification, sentiment analysis, and topic modeling (LDA, NMF)
- Modern NLP Approaches - Word embeddings (Word2Vec, GloVe), neural language models, RNNs, LSTMs, transformer architecture, and the Hugging Face ecosystem
- Data Science + NLP - Transfer learning (BERT, ELMo), data augmentation strategies, and low-resource African language processing
Outcomes
After successful completion of this module, students will be able to:
- Apply traditional and modern NLP techniques to real-world text data
- Critically evaluate the capabilities, limitations, and biases of NLP models
- Work with state-of-the-art transformer models and pre-trained language models
- Address challenges in low-resource language NLP
Instructor
This module has been taught by Prof. Vukosi Marivate since 2019. Vukosi has a background in Machine Learning and Artificial Intelligence and is interested in the role of Data Science in Society.