The following is a list of papers that I think are worth reading for our

discussion of machine translation. I've tried to give a short blurb about

each of the papers to put them in context. I've included a number of papers

that I marked "OPTIONAL" that I think are interesting, but are either

supplementary or the material is more or less covered in the other papers.

If anyone would like more information on a particular topic or would

like to discuss any of these papers, feel free to e-mail me

A Statistical MT Tutorial Workbook. Kevin Knight. 1999.

Very good introduction to word-based statistical machine translation.

Written in an informal, understandable, tutorial oriented style.

Automating Knowledge Acquisition for Machine Translation.

Kevin Knight. 1997.

(OPTIONAL) Another tutorial oriented paper that steps through

how one can learn from bilingual data. Also introduces a number of

important concepts for MT.

Foundations of Statistical NLP, chapter 13. Manning and Schutze. 1999.

(OPTIONAL) Must be accessed from UCSD. Overview of statistical MT.

Spends a lot of time on sentence and word alignment of bilingual data.

Foundations of Statistical NLP, chapter 6. Manning and Schutze. 1999.

(OPTIONAL) Must be accessed from UCSD. Discusses n-gram language

modeling. Language modeling is crucial for SMT and many other natural

language applications. I won't spend much time discussing language

modeling, but for those that are interested this is a good introduction.

Word models:

The Mathematics of Statistical Machine Translation:

Parameter Estimation. P. F. Brown, S. A. Della Pietra,

V. J. Della Pietra and R.L. Mercer. 1993.

(OPTIONAL) All you ever wanted to know about word level

models. Describes IBM models 1-5 and parameter estimation

for these models. It's about 50 pages and contains a lot of

material for the interested reader.

Word model decoding:

Decoding Algorithm in Statistical Machine Translation.

Ye-Yi Wand and Alex Waibel. 1997.

Early paper discussing decoding of IBM model 2. The paper

provides a fairly good introduction to word-level decoding

including multi-stack search (i.e. multiple beams) and rest

cost estimation (heuristic functions).

An Efficient A* Search Algorithm for Statistical Machine Translation.

Franz Josef Och, Nicola Ueffing, Hermann Ney. 2001.

(OPTIONAL) One of many papers on decoding with word-based SMT. They

discuss the basic idea of viewing decoding as state space search and

provide one method for doing this. They describe decoding for Model 3

and suggest a few different heuristics that are admissible, leading to few search errors.

Phrase based statistical MT:

Statistical Phrase-Based Translation.

Philipp Koehn, Franz Jasof Ock and Daniel Marcu. 2003.

Good, short overview of phrased based systems. If you want more

details, see the paper below.

The Alignment Template Approach to Statistical Machine Translation.

Franz Josef Och and Hermann Ney. 2004.

(OPTIONAL) This is a journal paper discussing one phrase based statistical system

including decoding. This is more or less the system used at ISI and

is probably the best current system (though syntax based systems my beat

these in the next few years). Requires acrobat 5 and to be at UCSD.

Phrase-based decoding:

See the previous paper.

Syntax based translation:

What's in a Translation Rule? Galley, Hopkins, Knight and Marcu. 2004.

This is the current system being investigated at ISI and the hope is that

these syntax based systems will perform better than phrase based systems.

The paper is a bit tough to read since it's a conference paper.

A Syntax-Based Statistical Translation Model. Yamada and Knight. 2001.

(OPTIONAL) Predecessor model to Galley et al., but similar.

Syntax based decoding:

Foundations of Statistical NLP, chapter 12. Manning and Schutze. 1999.

Must be on campus. This is a chapter on parsing (not actually decoding)

However, since the above rules are very similar to PCFGs, then decoding

is very similar to parsing... just with more complications.

A Decoder for Syntax-Based Statistical MT. Kenji Yamada and Kevin Knight. 2001.

(OPTIONAL) Decoder for the above Yamada and Knight model.

Discriminative Training:

Discriminative Training and Maximum Entropy Models for Statistical Machine Translation.

Och and Ney. 2002.

Learning how the best models for combining the different models (traslation

model, language model, etc.) using maximum entropy parameter estimation.

This line of research is still very important and my be interesting to

many of you since it's very machine learningy.

Discriminative Reranking for Machine Translation.

Shen, Sarkar and Och. 2004.

(OPTIONAL) Given a ranked output of possible translations from the

translation system, this paper uses the perceptron algorithm to learn

a reranking of the sentences to improves the top translation.

MT Evaluation:

BLEU: A Method for Automatic Evaluation of Machine Translation.

Papineni, Roukos, Ward and Zhu. 2001.

Foundational method for evaluating MT methods and still used currently.