User Documentation for Lilliput

Lilliput is a server containing 4 Tesla C1060 CPUs and a conventional quad core CPU. It is funded by NSF contract OCE-0835839.

Technical Summary

Component Description
Hardware Processor Intel Xeon E5504 @2.0 GHz
#CPUs 4
Cache 4 MB
Memory 16 GB
Accelerator Units 4 Tesla C1060 GPUs
4x4 GB Device memory
Software Operating System Ubuntu 10.04
Compilers GNU: Fortran77 C C++
shell bash
Policy Reserve for using more than 1 GPU

Account Registration

If you would like to have an account on Lilliput, please send email to professor Scott B Baden at and provide a description of your goals along with an estimate of how computer wallclock time you’ll need to complete your project. Such accounts will normally need to be renewed each year.

If you have a class account, your account will be turned off at the end of the quarter. If you would like to continue to have access to Lilliput, contact the professor you took the course from.

While Lillput is backed up, you should maintain backup copies of critical files.

Additional Request/Problem report

If you have any request for us, e.g, installing a specific software, or if you see any problem on Lilliput, please send email to CSEHelp at and remember to CC to professor Scott Baden at

Lilliput runs Linux and may be programmed using a combination of pthreads, MPI, and CUDA. A web page of resources for using these models can be found in the following list of resources.

  • Programming with Pthreads
  • Programming with OpenMP
  • Programming with CUDA
  • Programming with CUDA+Pthreads/OPENMP
  • Programming with MPI
  • Programming with MPI+PThreads/OpenMP
  • Programming with MPI+CUDA

  • 1. Programming with Pthreads

    Lilliput contains a multicore processor (4 cores) and runs a UNIX-like operating system. One way of realizing parallelism is via threads, i.e. POSIX threads (AKA Pthreads) The Gnu C and C++ compilers are available; make sure that the path to Gnu compilers is included in your PATH environment variable:

    which gcc
    gcc --version
    export PATH=/usr/bin:$PATH

    To compile a multi-threaded program, -lpthread is needed.
    C compiler: gcc program -o executable_file -lpthread
    C++ compiler: g++ program -o executable_file -lpthread

  • DotProduct-Pthreads is a simple multi-threaded program which calculates the dot product of 2 vectors. A sample makefile is also included.

  • Other useful links:
  • Introduction to Programming Threads
  • POSIX Threads Programming

    2. Programming with OpenMP

    OpenMP that supports shared-memory programming on many architectures using a higher level model than threads. OpenMP simplifies parallel programming providing a set of pragmas, runtime routines, and environment variables which are supported by the compiler.

    To compile an OpenMP program with the Gnu compilers, use the -fopenmp flag.
    C compiler: gcc program -fopenmp -o executable_file
    C++ compiler: g++ program -fopenmp -o executable_file

    To set the number of threads:
    export OMP_NUM_THREADS=number_of_threads

  • DotProduct-OMP is an OpenMP program which adds directives into the following sequential Dot Product implementation DotProduct-Serial.

  • Other useful links and books:
  • Using OpenMP - Portable Shared Memory Parallel Programming
  • OpenMP forum

    3. Programming with CUDA

    GPU programs are written in an extension of C called CUDA C, and compiled using the CUDA C compiler, nvcc. Before running Make, be sure to set up the compiler and library paths in the first time by adding the "export" commands to your .bashrc file as follows:

    vim ~/.bashrc
    export PATH=/opt/nvidia/latest/cuda/bin/:$PATH
    export LD_LIBRARY_PATH=/opt/nvidia/latest/cuda/lib64:$LD_LIBRARY_PATH
    source ~/.bashrc

    Next, invoke the CUDA C compiler:
    nvcc program -o executable_file
    and then run the executable file as you would an ordinary C or C++ program

    For more complex programs, you may need to specify the GPU architecture, additional libraries, etc. To get you started, we have set up a simple CUDA application, which has an include file for the Makefile to set up the appropriate compilation flags: DotProduct-CUDA.

    Setting up a private copy of the SDK

    If you need to install a private copy of the SDK, go to your home directory

    cd ~
    Next run
    and then do
    cd ~/NVIDIA_GPU_Computing_SDK/C
    Due to some Compatibility issues, you may need to change the default Gnu C/C++ compiler to version 4.3. To do this edit common/ and append the compiler specification to this line
    NVCCFLAGS      :=
    as follows:
    NVCCFLAGS      := --compiler-bindir=/usr/bin/gcc-4.3
    Finally, type "make". should do it.

    Note: If you are taking a class there should be no need to set up a private copy of the SDK, as the instructor will set up one in a central location. The SDK consumes nearly 400MB of disk storage, which may limit the amount of disk space available to you.

    Installing CUDA on your own hardware

    Because CUDA is free, you may install it on your own system and develop your GPU code there. Doing so will help lower the workload on Lilliput. You may even compile in emulation mode which will enable you to run without a GPU. Applications run more slowly under emulation mode than on real GPU hardware, and there are some differences, but emulation mode can be handy.

    Here are Fred Lionetti’s notes on how to install CUDA under MacOS or Ubuntu. The definitive source of information and software about CUDA is the Cuda Zone, which is hosted by Nvidia, and which includes user forums. A list of other sources of information is located Here

    Kernel tuning

    To gain a deeper understanding the behavior and performance of your kernel, you may use cudaprof to profiling your kernel(s). Cudaprof is a GUI-based tool so when you use ssh to connect to Lilliput, be sure to enable X11 forwarding: ssh to with the -X option.
    ssh -X

    Then add the path of cudaprof into your .bashrc file:
    vim ~/.bashrc
    export PATH=/opt/nvidia/latest/cuda/cudaprof/bin/:$PATH
    source ~/.bashrc

    Now cudaprof is available to run as follow:

    4. Programming with CUDA+Pthreads/OpenMP

    It is possible to run with multiple GPUs (as many as 4). To this end, you will run multiple threads, each invoking a CUDA kernel.

    If you want to perform multiple GPU runs, you must contact Professor Baden before doing so. In the case of a class account, contact your instructor.

    In some cases you may want to perform some of the work on the CPU rather than the GPU, for example, if certain portions of your code contain many branches or in some other way that exhibit irregular behavior. GPUs excel at exploiting data parallelism. Thus, you may want to write a hybrid hybrid program that can take advantage of both GPU and CPU.

    5. Programming with MPI

    Although there are 2 MPI implementations available on Lilliput-- OpenMPI and MPICH--we recommend that you use OpenMPI, unless you have special needs that require MPICH. OpenMPI is somewhat simpler to use than MPICH. Whichever version you use, you must consistently use the MPI compiler and runtime coming from the same implementation.

    Add the commands below to your .bashrc file to set the environment variables:

    export PATH=/opt/mpi/openmpi/bin:$PATH
    export LD_LIBRARY_PATH=/opt/mpi/openmpi/lib:$LD_LIBRARY_PATH

    Then type:
    source ~/.bashrc

    To use OpenMPI, compile and then run your program as follows

    mpicc your_program -o executable_file
    mpirun -np #processors executable_file

    You may be asked to provide your password each time you run your application. To avoid this problem, you need to do some more steps:
    1. Generate an DSA key pair:
    ssh-keygen -t dsa

    Press Enter three times

    2. Copy the file generated by ssh-keygen to $HOME/.ssh/authorized_keys
    cd $HOME/.ssh
    cp authorized_keys

    Run your program, provide your password. And then it never asks you for the password again.


    Set the path:
    vim ~/.bashrc
    export PATH=/opt/mpi/mpich2/bin:$PATH
    source ~/.bashrc

    Create the mpd.conf file:
    touch .mpd.conf
    chmod 600 .mpd.conf
    Next open .mpd.conf and enter
    secretword=[a word]
    where [a word] is your chosen secret word.

    Compile your program:

    mpicc your_program -o executable_file

    start mpd

    Run your program:
    mpirun -np #processors executable_file

    Exit mpd:


    Dot product using the Message Passing Interface DotProduct-MPI.

    Other useful links and books

    6. Programming with MPI+OpenMP/Pthreads

    Lilliput is just a single machine which has 4 cores. So it's not suitable to use the hybrid program of MPI and OpenMP/Pthreads on it. However, Lilliput is a very useful environment to develop this kind of application and do some simple tests. The final program can be evaluated in other distributed systems.

    7. Programming with MPI+CUDA

    Lilliput has 4 Tesla GPUs corresponding with 4 CPU cores. If you would like to use multiple GPUs, either MPI or Pthreads can be utilized. However, to make the application more extensible on larger systems, MPI+CUDA is preferable. To compile a MPI+CUDA program, you need to compile the CUDA program to object files first and then link these files using mpicc/mpicxx.

    Created by Nguyen Thanh Nhat Tan
    Last update on Sartuday, September 27, 2010