CSE 190: Statistical Natural Language Processing

CSE 190: Statistical Natural Language Processing

Term: Winter Qtr 2018
Credits: 4
Lecture: Tuesday and Thursday 12:30pm-1:50pm, CENTER 105
Instructor: Ndapa Nakashole, CSE 4108
Office Hours: Tuesday 3pm - 4pm & Friday 1pm - 2pm

Teaching Assistants & office hours:
      Sparsh Gupta [spg005-at-eng.ucsd.edu]        Monday 9-10am, CSE B250A
      Sindhura Raghavan [sindhura-at-eng.ucsd.edu]        Wednesay 3:30-4:30pm, CSE 4258
      Xudong Sun [xus022-at-eng.ucsd.edu]        Tuesday 10-11am, CSE B250A

Announcements

Jan 23: Assignment 2 posted.
Jan 19: TA office hours and rooms posted.
Jan 18: Course content migrated to Piazza. TritonEd page no longer updated.
Jan 11: Probability Review link posted.
Jan 10: Assignment 1 posted on TritonEd, under Content.
Jan 10: Link to Python Numpy Tutorial posted.
Jan 9: Lecture 1 slides uploaded to TritonEd, under Content.

Course Description

Natural language processing (NLP) is a field of AI which aims to equip computers with the ability to intelligently process natural (human) language. This course will explore statistical techniques for the automatic analysis of natural language data. Specific topics covered include: probabilistic language models, which define probability distributions over text sequences; text classification; sequence models; parsing sentences into syntactic representations; machine translation, and machine reading.

The course assumes knowledge of basic probability. Probability Review

Course projects will require programming in Python. Python Numpy Tutorial

Grading

The course is lab-based. You will complete five hands-on programming assignments, individually, not in teams. All assignments contribute equally. Class participation contributes 10% to the final grade, the rest of the grade is based on the assignments. Assignment submission instructions are provided in each of the assignment descriptions.

Late Submission Policy
Each student will be granted 5 late days to use over the duration of the quarter. There are no restrictions on how the late days can be used, however, note that we will not be able to accept late submissions for the last assignment. Using late days will not affect your grade. However, submitted late after all late days have been used will receive no credit. Make sure to plan ahead.

Books
Recommended texts are:

[J&M]: Jurafsky and Martin, Speech and Language Processing, 2nd edition (amazon)
[M&S]: Manning and Schütze, Foundations of Statistical Natural Language Processing (amazon) (online)

[J&M] 3rd edition free chapters online
[M&S] is free online.

Syllabus (tentative)

Date Topic/Readings Assignment (Out)

Jan 9 Introduction

J&M Chapter 1 Introduction

Hirschberg & Manning, Science 2015 Advances in NLP

Language Modelling

Jan 11 Michael Collins. Notes on Language Modelling P1: Language Modeling (Due Jan 26)

J&M Chapter 4 N-grams

Jan 16 Michael Collins. Notes on Log-linear models

Goldberg, JAIR 2016 A Primer on Neural Network Models for NLP. (Sections 1-4 & 10-13)

Text Classification

Jan 18 J&M Chapter 6 Naive Bayes and Sentiment Classification

J&M Chapter 7 Logistic Regression

Michael Collins. Notes on Naive Bayes, MLE, and EM

Distributional Semantics

Jan 23 & 25 Goldberg, JAIR 2016 A Primer on Neural Network Models for NLP. Sections 1-5 P2: Text Classification (Due Feb 9)

Chris McCormick, 2016 Word2Vec Tutorial - The Skip-Gram Model

Mikolov et al., NIPS 2013 Distributed Representations of Words and Phrases and their Compositionality

Mikolov et al., 2013 Efficient Estimation of Word Representations in Vector Space

Tagging Problems & Hidden Markov Models

Jan 30 J&M Chapter 9 Hidden Markov Models

Michael Collins. Notes on Tagging with Hidden Markov Models

J&M Chapter 10 Part-of-Speech Tagging

Parsing and Context Free Grammars

Feb 1 & 6 Michael Collins. Notes on Probabilistic Context-Free Grammars

(Optional) J&M Chapter 12 Syntactic Parsing

(Optional) J&M Chapter 13 Statistical Parsing

Feb 8 Michael Collins. Notes on Lexicalized Probabilistic Context-Free Grammars P3: Sequence Tagging (Due Feb 23rd)

Machine Translation

Feb 13 ---

Feb 15 Michael Collins. Notes on Statistical Machine Translation

Feb 20 & 22 Michael Collins. Notes on Phrase-Based Translation Models P4: Syntax Parsing (Due Mar 6th)

Feb 27 Graham Neubig. Tutorial on Neural Machine Translation

Machine Reading

Mar 1 Carlson et al AAAI 2010. Toward an Architecture for Never-Ending Language Learning P5: Machine Translation (Due Mar 16th - no late days)

Mitchell et al AAAI 2015. Never Ending Learning

Sukhbaatar et al., NIPS 2015 End-To-End Memory Networks

Mar 6 J&M Chapter 21 Information Extraction

Mar 8 Sentence Representation (Kiros et al., NIPS 2015 Skip-Thought Vectors)

Coreference Resolution

Dialogue Systems and Chatbots

Mar 13 J&M Chapter 29 Dialogue Systems and Chatbots

March 15 ----

Date	Topic/Readings	Assignment (Out)
Jan 9	Introduction
	J&M Chapter 1 Introduction
	Hirschberg & Manning, Science 2015 Advances in NLP
Language Modelling
Jan 11	Michael Collins. Notes on Language Modelling	P1: Language Modeling (Due Jan 26)
Jan 11	J&M Chapter 4 N-grams
Jan 16	Michael Collins. Notes on Log-linear models
Jan 16	Goldberg, JAIR 2016 A Primer on Neural Network Models for NLP. (Sections 1-4 & 10-13)
Text Classification
Jan 18	J&M Chapter 6 Naive Bayes and Sentiment Classification
	J&M Chapter 7 Logistic Regression
	Michael Collins. Notes on Naive Bayes, MLE, and EM
Distributional Semantics
Jan 23 & 25	Goldberg, JAIR 2016 A Primer on Neural Network Models for NLP. Sections 1-5	P2: Text Classification (Due Feb 9)
	Chris McCormick, 2016 Word2Vec Tutorial - The Skip-Gram Model
	Mikolov et al., NIPS 2013 Distributed Representations of Words and Phrases and their Compositionality
	Mikolov et al., 2013 Efficient Estimation of Word Representations in Vector Space
Tagging Problems & Hidden Markov Models
Jan 30	J&M Chapter 9 Hidden Markov Models
	Michael Collins. Notes on Tagging with Hidden Markov Models
	J&M Chapter 10 Part-of-Speech Tagging
Parsing and Context Free Grammars
Feb 1 & 6	Michael Collins. Notes on Probabilistic Context-Free Grammars
	(Optional) J&M Chapter 12 Syntactic Parsing
	(Optional) J&M Chapter 13 Statistical Parsing
Feb 8	Michael Collins. Notes on Lexicalized Probabilistic Context-Free Grammars	P3: Sequence Tagging (Due Feb 23rd)
Machine Translation
Feb 13	---
Feb 15	Michael Collins. Notes on Statistical Machine Translation
Feb 20 & 22	Michael Collins. Notes on Phrase-Based Translation Models	P4: Syntax Parsing (Due Mar 6th)
Feb 27	Graham Neubig. Tutorial on Neural Machine Translation
Machine Reading
Mar 1	Carlson et al AAAI 2010. Toward an Architecture for Never-Ending Language Learning	P5: Machine Translation (Due Mar 16th - no late days)
	Mitchell et al AAAI 2015. Never Ending Learning
	Sukhbaatar et al., NIPS 2015 End-To-End Memory Networks
Mar 6	J&M Chapter 21 Information Extraction
Mar 8	Sentence Representation (Kiros et al., NIPS 2015 Skip-Thought Vectors)
Mar 8	Coreference Resolution
Dialogue Systems and Chatbots
Mar 13	J&M Chapter 29 Dialogue Systems and Chatbots
March 15	----