DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY OF CALIFORNIA, SAN DIEGO


CSE 254: Conditional random fields (CRFs) and related topics

Software for log-linear models and CRFs


LOG-LINEAR MODELS

The Tsujii lab at the University of Tokyo has well-documented software named Amis and SS Maxent for training log-linear classifiers.  However, I recommend writing your own code that uses an off-the-shelf general nonlinear optimization package, for example L-BFGS.


NONLINEAR OPTIMIZATION

A common opinion among machine learning researchers is that the limited-memory BFGS implementation by Jorge Nocedal is the best-performing.  Liam Stewart has published a Matlab wrapper for this code.  You may also use the nonlinear optimization routines in the Matlab optimization toolbox.  An interesting method that has never been applied in machine learning but might be useful is ve08, since it is designed for objective functions that consist of components that are added together where each component only uses a small subset of all variables.

Ian Fasel provided the following information:  "I spent a few moments figuring out how to build lbfgs on Mac OS X (since the developer tools don't come with a built-in fortran compiler.)  You can buy the Absoft compilers, but it seems that gfortran works fine with Matlab 7.  The following instructions seem to work -- though I've only tried testing it with tst_lbfgs.m so far.  I'm using a G5 PowerMac, Tiger, things could be different for an Intel based mac.

1) Grab MATLAB LBFGS wrapper v1.1 http://www.cs.toronto.edu/~liam/software.shtml
2) Get routines.f from the site linked there.
3) Follow the instructions on http://hpc.sourceforge.net/ to install gfortran on your computer.
4) cd to the directory lbfgs-1.1/lbfgs, then copy a mexopts.sh file to here, e.g., cp /Applications/MATLAB73/bin/mexopts.sh .
5) Modify the local mexopts.sh file.  Find the section labeled "mac)" and change as follows:

           #FC='f77'
           #FFLAGS='-f -N15 -N11 -s -Q51 -W'
           #ABSOFTLIBDIR=`which $FC | sed -n -e '1s|bin/'$FC'|lib|p'`
           #FLIBS="-L$ABSOFTLIBDIR -lfio -lf77math"
           #FOPTIMFLAGS='-O -cpu:g4'

           FC='/usr/local/bin/gfortran'
           FFLAGS=
           FLIBS="-L/usr/local/lib -lgfortran"
           FOPTIMFLAGS='-O5 -funroll-loops -ftree-vectorize'

In other words, comment out the Absoft stuff, and replace with gfortran stuff.  You may want to experiment with optimization flags."


CONDITIONAL RANDOM FIELDS

Hanna Wallach has compiled a list of CRF-related software packages; Andrew McCallum's Mallet written in Java is recommended.  CRF++ by Taku Kudo, written in C++, is good also.  Try also an extended version of CRF++ with stochastic gradient optimization named crfsmd.  

Other packages, likely to be good also:  CRFs in Java by Sunita Sarawagi, and large-scale parallel flexible CRFs by Xuan-Hieu Phan and Le-Minh Nguyen.

OTHER SOFTWARE

For a non-CRF approach to structured prediction see Searn by Hal Daumé.  For finding the optimal labels for nodes in special cases of very large random fields, see the Middlebury MRF Energy Minimization Page.