# Teaching

## Machine learning for NLP (2022)

Any question? Contact me at aurelie DOT herbelot AT unitn DOT it.

## Description

The course introduces core Machine Learning algorithms for computational linguistics (CL). Its goal is to (1) provide students with an overview of core Machine Learning techniques, widely used in CL; (2) understand in which contexts / for which applications each technique is suitable; (3) understand the experimental pipeline necessary to apply the technique to a particular problem, including possible data collection and choice of evaluation method; (4) get some practice in running Machine Learning software and interpreting the output. The syllabus is meant to cover Machine Learning methods from both a theoretical and practical point of view, and to give students a tool to read relevant scientific literature with a critical mind.

At the end of the course, students will: (1) demonstrate knowledge of the principles of core Machine Learning techniques; (2) be able to read and understand CL literature using the introduced techniques, and critically assess their use in research and applications; (3) have some fundamental computational skills allowing them to run existing Machine Learning software and interpret their output.

*Pre-requisites:* There are no prerequisites for this course. Students with no computational background will acquire good intuitions for a range of Machine Learning techniques, as well as basic practical skills to interact with existing software. Students with a good mathematical and computational background (including programming and familiarity with the Unix command line) will be invited to gain a deeper understanding of each introduced algorithm, and to try out their own modifications of the course software.

The whole course will introduce ten topics, including a general introduction in the first week. Each topic will be taught over three sessions including 1) a lecture explaining the theory behind a technique/algorithm; 2) a discussion group focusing on a scientific paper where the technique is put to work, demonstrating the experimental pipeline around the algorithm; 3) a hands-on session where the students will have a chance to run some software to familiarise themselves with the practical implementation of the method.

## Course schedule:

### Week 1: Introduction

**Lecture 1:** General introduction. Slides

**Lecture 2:** Basic principles of statistical NLP: Language modelling, Naive Bayes, evaluation with Precision/Recall. Slides

**Practical:** Run a simple authorship attribution algorithm using Naive Bayes. (The code is on GitHub.)

### Week 2: Data preparation techniques

**Lecture:** How to choose your data. Annotating. Focus on inter-annotator agreement metrics. Slides

**Practical:** Hands-on intro to Wikipedia pre-processing. (The code is on GitHub.)

### Week 3: Supervised learning

**Lecture:** Introduction to regression (linear, gradient descent, PLSR) and to the k-nearest neighbours algorithm. Slides

**Practical:** Intro to mapping between semantic spaces for translation. (The code is on GitHub.) For those wanting a simple tutorial on linear regression in Python, check: http://www.dataschool.io/linear-regression-in-python/.

### Week 4: Unsupervised learning

**Lecture:** Dimensionality reduction and clustering (SVD, LSH, random indexing, K-means). Slides

**Practical:** Implementing the fruit fly for similarity search with random indexing. (The code is on GitHub.)

### Week 5: Support Vector Machines

**Lecture:** SVM principles. Introduction to kernels. Slides

**Practical:** Classify documents into topics using SVMs. (The code is on GitHub.)

### Week 6: Introduction to Neural Networks

**Lecture:** Basics of NNs. Slides

**Practical:** Follow tutorial on implementing an NN from scratch: http://www.wildml.com/2015/09/implementing-a-neural-network-from-scratch/.

### Week 7: RNNs and LSTMs

**Lecture:** Sequence learning with RNNs and LSTMs. Slides

**Practical:** Generate ASCII cats with an RNN. (The code is on GitHub)

### Week 8: Adopt a network week!

**The network Zoo:** A wild race through a few architectures. Slides

Pick a network from the Network Zoo and check whether it has language applications. If so, adopt it!

### Week 9: Reinforcement learning

**Lecture:** Principles of RL. Slides

**Practical:** Solving the OpenAI gym ‘frozen lake’ puzzle: https://github.com/simoninithomas/Deep_reinforcement_learning_Course/blob/master/Q%20learning/Q%20Learning%20with%20FrozenLake.ipynb. And ordering a coffee at Rovereto train station. (The coffee code is on GitHub.)

### Week 10: The ethics of machine learning

**Lecture:** Ethical issues with ML. Bias in distributional vectors. Slides

**Practical:** Finding indirect gender biases in FastText vectors. (This exercise is open-ended, but there is some data and code on GitHub.)