Computational Proteomics
with Nuno Banderia, Stefano Bonissone, Ari Frank, Nitin Gupta, Kyowon Jeong, Sangtae Kim, Julio Ng

In a few seconds, a mass-spectrometer is capable of breaking a peptide into fragments and measuring their masses (the spectrum of the peptide). The peptide sequencing problem is to derive the sequence of a peptide given its spectrum. For an ideal fragmentation process (each peptide is cleaved between every two consecutive amino acids) and an ideal mass-spectrometer, the peptide sequencing problem is simple. In practice, the fragmentation processes are far from ideal, thus making de novo peptide sequencing and peptide identification via database search difficult.

Mass-spectrometry is very successful in identification of proteins already present in genome databases. These algorithms relies on the ability to "look the answer up in the back of the book" when studying genomes of sequenced organisms. An experimental spectrum can be compared with theoretical spectra for each peptide in a database, and the peptide from the database with the best fit usually provides the sequence of the experimental peptide. However, in light of potential multiple mutations and modifications, the reliability of database search methods may be called into question.

Since proteins are parts of complex systems of cellular signalling, they are subject to an almost uncountable number of biological (post-translational) modifications. Almost all proteins are post-translationally modified, and as many as 200 types of modifications of amino acid residues are known. Finding them results in a challenging computational problem for the post-genomic era: given a large collection of spectra representing the human proteome, find out which types of modifications are present in each human proteins under different conditions.