CSE 190: Statistical Natural Language Processing

Term: Winter Qtr 2018
Credits: 4
Lecture: Tuesday and Thursday 12:30pm-1:50pm, CENTR 105
Instructor: Ndapa Nakashole, EBU3B 4108
Office Hours: Friday 1pm - 3pm

Teaching Assistants:
      Sparsh Gupta [spg005-at-eng.ucsd.edu]
      Sindhura Raghavan [sindhura-at-eng.ucsd.edu]
       Xudong Sun [xus022-at-eng.ucsd.edu]
      Office hours: TBD



Course Description

Natural language processing (NLP) is a field of AI which aims to equip computers with the ability to intelligently process natural (human) language. This course will explore statistical techniques for the automatic analysis of natural language data. Specific topics covered include: probabilistic language models, which define probability distributions over text sequences; text classification; sequence models; parsing sentences into syntactic representations; machine translation, and machine reading.

The course assumes knowledge of basic probability. Probability Review

Course projects will require programming in Python. Python Numpy Tutorial


The course is lab-based. You will complete five hands-on programming assignments, individually, not in teams. All assignments contribute equally. Class participation contributes 10% to the final grade, the rest of the grade is based on the assignments. Assignment submission instructions are provided in each of the assignment descriptions.

Late Submission Policy
Each student will be granted 5 late days to use over the duration of the quarter. There are no restrictions on how the late days can be used (e.g. all 5 could be used on one project.) Using late days will not affect your grade. However, projects submitted late after all late days have been used will receive no credit. Make sure to plan ahead.

Recommended texts are:

[J&M] 3rd edition free chapters online
[M&S] is free online.

Syllabus (Tentative, subject to change!)
Date Topic/Readings Assignment (Out)
Jan 9 Introduction
J&M Chapter 1 Introduction
Hirschberg & Manning, Science 2015 Advances in NLP
Language Modelling
Jan 11 Michael Collins. Notes on Language Modelling P1: Language Modeling (Due Jan 26)
J&M Chapter 4 N-grams
Jan 16 Michael Collins. Notes on Log-linear models
Goldberg, JAIR 2016 A Primer on Neural Network Models for NLP. (Sections 1-4 & 10-13)
Text Classification & Sentiment Analysis
Jan 18 J&M Chapter 6 Naive Bayes and Sentiment Classification
J&M Chapter 7 Logistic Regression
Michael Collins. Notes on Naive Bayes, MLE, and EM
Distributional Semantics
Jan 23 & 25 Goldberg, JAIR 2016 A Primer on Neural Network Models for NLP. Sections 3 & 5 P2: Text Classification (Due Feb 9)
Sequence models (Hidden Markov Models)
TBD J&M Chapter 9 Hidden Markov Models P3: Sequence Tagging (Due Feb 20th)
Michael Collins. Notes on Tagging with Hidden Markov Models
J&M Chapter 10 Part-of-Speech Tagging
Parsing and Context Free Grammars
Feb 1 P4: Syntax Parsing (Due Mar 6th)
Feb 6
Feb 8
Machine Translation
Feb 13 P5: Machine Translation (Due Mar 19th)
Feb 15
Feb 20
Feb 22
Feb 27
Advanced Topic
Mar 1
Mar 6
Advanced Topic
Mar 13
Mar 15