CSE 190: Statistical Natural Language Processing

Term: Winter Qtr 2018
Credits: 4
Lecture: Tuesday and Thursday 12:30pm-1:50pm, CENTER 105
Instructor: Ndapa Nakashole, CSE 4108
Office Hours: Tuesday 3pm - 4pm & Friday 1pm - 2pm

Teaching Assistants & office hours:
      Sparsh Gupta [spg005-at-eng.ucsd.edu]        Monday 9-10am, CSE B250A
      Sindhura Raghavan [sindhura-at-eng.ucsd.edu]        Wednesay 3:30-4:30pm, CSE 4258
      Xudong Sun [xus022-at-eng.ucsd.edu]        Tuesday 10-11am, CSE B250A

UCSD CSE


Announcements

Course Description

Natural language processing (NLP) is a field of AI which aims to equip computers with the ability to intelligently process natural (human) language. This course will explore statistical techniques for the automatic analysis of natural language data. Specific topics covered include: probabilistic language models, which define probability distributions over text sequences; text classification; sequence models; parsing sentences into syntactic representations; machine translation, and machine reading.

The course assumes knowledge of basic probability. Probability Review

Course projects will require programming in Python. Python Numpy Tutorial

Grading

The course is lab-based. You will complete five hands-on programming assignments, individually, not in teams. All assignments contribute equally. Class participation contributes 10% to the final grade, the rest of the grade is based on the assignments. Assignment submission instructions are provided in each of the assignment descriptions.

Late Submission Policy
Each student will be granted 5 late days to use over the duration of the quarter. There are no restrictions on how the late days can be used, however, note that we will not be able to accept late submissions for the last assignment. Using late days will not affect your grade. However, submitted late after all late days have been used will receive no credit. Make sure to plan ahead.

Books
Recommended texts are:

[J&M] 3rd edition free chapters online
[M&S] is free online.


Syllabus (tentative)
Date Topic/Readings Assignment (Out)
Jan 9 Introduction
J&M Chapter 1 Introduction
Hirschberg & Manning, Science 2015 Advances in NLP
Language Modelling
Jan 11 Michael Collins. Notes on Language Modelling P1: Language Modeling (Due Jan 26)
J&M Chapter 4 N-grams
Jan 16 Michael Collins. Notes on Log-linear models
Goldberg, JAIR 2016 A Primer on Neural Network Models for NLP. (Sections 1-4 & 10-13)
Text Classification
Jan 18 J&M Chapter 6 Naive Bayes and Sentiment Classification
J&M Chapter 7 Logistic Regression
Michael Collins. Notes on Naive Bayes, MLE, and EM
Distributional Semantics
Jan 23 & 25 Goldberg, JAIR 2016 A Primer on Neural Network Models for NLP. Sections 1-5 P2: Text Classification (Due Feb 9)
Chris McCormick, 2016 Word2Vec Tutorial - The Skip-Gram Model
Mikolov et al., NIPS 2013 Distributed Representations of Words and Phrases and their Compositionality
Mikolov et al., 2013 Efficient Estimation of Word Representations in Vector Space
Tagging Problems & Hidden Markov Models
Jan 30 J&M Chapter 9 Hidden Markov Models
Michael Collins. Notes on Tagging with Hidden Markov Models
J&M Chapter 10 Part-of-Speech Tagging
Parsing and Context Free Grammars
Feb 1 & 6 Michael Collins. Notes on Probabilistic Context-Free Grammars
(Optional) J&M Chapter 12 Syntactic Parsing
(Optional) J&M Chapter 13 Statistical Parsing
Feb 8 Michael Collins. Notes on Lexicalized Probabilistic Context-Free Grammars P3: Sequence Tagging (Due Feb 23rd)
Machine Translation
Feb 13 ---
Feb 15 Michael Collins. Notes on Statistical Machine Translation
Feb 20 & 22 Michael Collins. Notes on Phrase-Based Translation Models P4: Syntax Parsing (Due Mar 6th)
Feb 27 Graham Neubig. Tutorial on Neural Machine Translation
Machine Reading
Mar 1 Carlson et al AAAI 2010. Toward an Architecture for Never-Ending Language Learning P5: Machine Translation (Due Mar 16th - no late days)
Mitchell et al AAAI 2015. Never Ending Learning
Sukhbaatar et al., NIPS 2015 End-To-End Memory Networks
Mar 6 J&M Chapter 21 Information Extraction
Mar 8 Sentence Representation (Kiros et al., NIPS 2015 Skip-Thought Vectors)
Coreference Resolution
Dialogue Systems and Chatbots
Mar 13 J&M Chapter 29 Dialogue Systems and Chatbots
March 15 ----