CSE 256: Statistical Natural Language Processing
Term: Spring Qtr 2019 |

Announcements

- April 1: Lecture 1 slides posted on TritonEd

Course Description.
Natural language processing (NLP) is a field of AI which aims to equip computers with the ability to intelligently process natural (human) language. This course will explore statistical techniques for the automatic analysis of natural language data. Specific topics covered include: probabilistic language models, which define probability distributions over text sequences; text classification; sequence models; parsing sentences into syntactic representations; machine translation, and machine reading.

The course assumes knowledge of basic probability.
Probability Review

Programming assignments will require knowledge of Python.
Python Numpy Tutorial

Grading.
The course is lab-based. You will complete: five hands-on programming assignments (individually); and a final project (can be done in ~~pairs~~ groups of up to three people).

Final Project.
The project can be done in teams of up to ~~two~~ three people. You will need to tell us your team composition by April 26 (The link is in the project description and on Piazza).

Late Submission Policy.
Please note that assignemnts must be submitted by the due date. Late submissions will not be accepted.

Academic Integrity.
If plagiarism is detected in the programming assignment code or report, University authorities will be notified for appropriate disciplinary action to be taken.

Books. Texts we will use:

- Jurafsky and Martin, Speech and Language Processing, 2nd edition (amazon) (online)
- Manning and Schütze, Foundations of Statistical Natural Language Processing (amazon) (online)

Syllabus (tentative)

Date | Topic/Readings | Assignment (Out) | |
---|---|---|---|

Apr 1 | Introduction |
||

J&M Chapter 1 Introduction | |||

Hirschberg & Manning, Science 2015 Advances in NLP | |||

Language Modelling |
|||

Apr 3 | Michael Collins. Notes on Language Modelling | PA1: Language Modeling (Due April 15) | |

Apr 5 & 8 | Eisenstein Chapter 6 Language Models | ||

Apr 10 | Michael Collins. Notes on Log-linear models | ||

Apr 12 | Michael Collins. Notes on Feedforward Neural Networks | ||

Eisenstein Chapter 6.3 Recurrent Neural Network Language Models | |||

Text Classification |
|||

Apr 15 & 17 | Eisenstein Chapter 2 Linear Text Classification | PA2: Text Classification (Due April 29) | |

Michael Collins. Notes on Naive Bayes, MLE, and EM | |||

Semantics |
|||

Apr 19 & 22 | Eisenstein Chapter 14 Distributional and distributed semantics | ||

Chris McCormick, 2016 Word2Vec Tutorial - The Skip-Gram Model | |||

Mikolov et al., NIPS 2013 Distributed Representations of Words and Phrases ... | |||

Mikolov et al., 2013 Efficient Estimation of Word Representations in Vector Space | |||

Apr 24 | Eisenstein Chapter 14.4 Brown clusters | ||

Tagging Problems & Hidden Markov Models |
|||

Apr 24 | Michael Collins. Notes on Tagging with Hidden Markov Models | ||

Eisenstein Chapter 8 Applications of sequence labeling | |||

Apr 29 |