About

Class Information

This module develops Natural Language Processing techniques for Data Science challenges. The course covers topics including misinformation detection, content moderation, and NLP for under-resourced languages. This course utilises a challenge-driven approach to learn about state-of-the-art natural language processing.

Course Structure

The course is organized into three blocks:

  1. Introduction to NLP - Traditional NLP foundations including tokenization, bag-of-words, TF-IDF, n-grams, text classification, sentiment analysis, and topic modeling (LDA, NMF)
  2. Modern NLP Approaches - Word embeddings (Word2Vec, GloVe), neural language models, RNNs, LSTMs, transformer architecture, and the Hugging Face ecosystem
  3. Data Science + NLP - Transfer learning (BERT, ELMo), data augmentation strategies, and low-resource African language processing

Outcomes

After successful completion of this module, students will be able to:

  • Apply traditional and modern NLP techniques to real-world text data
  • Critically evaluate the capabilities, limitations, and biases of NLP models
  • Work with state-of-the-art transformer models and pre-trained language models
  • Address challenges in low-resource language NLP

Instructor

This module has been taught by Prof. Vukosi Marivate since 2019. Vukosi has a background in Machine Learning and Artificial Intelligence and is interested in the role of Data Science in Society.