The nature of text data
Preprocessing: tokenization and lemmatization
Bag-of-words topic models and naive classification
N-gram language models (Markov models)
Hidden Markov models and part-of-speech tagging
Distributed representations and vector semantics
Recurrent neural language models LSTMs and language generation
Transformers and masked language modeling
Encoder models and semantic search
Encoder-decoder models text summarization and translation