Spectral Networks
Shotgun Protein Sequencing
Non-ribosomal Peptide Sequencing
MS-Clustering
MS-Dictionary
Spectral Networks

Spectral Networks

Advances in tandem mass spectrometry (MS/MS) steadily increase the rate of generation of MS/MS spectra. As a result, the existing approaches that compare spectra against databases are already facing a bottleneck, particularly when interpreting spectra of modified peptides. Here we explore a concept that allows one to perform an MS/MS database search without ever comparing a spectrum against a database. We propose to take advantage of spectral pairs, which are pairs of spectra obtained from overlapping (often nontryptic) peptides or from unmodified and modified versions of the same peptide. Having a spectrum of a modified peptide paired with a spectrum of an unmodified peptide allows one to separate the prefix and suffix ladders, to greatly reduce the number of noise peaks, and to generate a small number of peptide reconstructions that are likely to contain the correct one. The MS/MS database search is thus reduced to extremely fast pattern-matching (rather than time-consuming matching of spectra against databases). In addition to speed, our approach provides a unique paradigm for identifying posttranslational modifications by means of spectral networks analysis.

Main page     Downloads     Publications


Shotgun Protein Sequencing

Despite significant advances in the identification of known proteins, the analysis of unknown proteins by tandem mass spectrometry (MS/MS) still remains a challenging open problem. Although Klaus Biemann recognized the potential of tandem mass spectrometry (MS/MS) for sequencing of unknown proteins in the 1980s, low-throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed, the powerful Shotgun DNA Sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated Shotgun Protein Sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that Shotgun Protein Sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.

Main page     Downloads     Publications
Shotgun Protein Sequencing


NRP-sequencing

Non-ribosomal peptide sequencing

While nonribosomal peptides (NRPs) are of tremendous pharmacological importance, there is currently no technology capable of high-throughput sequencing of NRPs. Difficulties in sequencing NRPs slow down the progress in elucidating the non-ribosomal genetic code and negatively affect various screening programs aimed at the discovery of natural compounds of medical importance. We propose to employ multistage mass-spectrometry (MSn) for the data acquisition, followed by alignment-based heuristic algorithms for data analysis. Since mass spectrometry based analysis of NRPs is fast and inexpensive, this approach opens the possibility of high-throughput sequencing of many unknown NRPs accumulated in large screening programs.


MS-Clustering

Tandem mass spectrometry (MS/MS) experiments often generate redundant datasets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS datasets (over ten million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular non-clustered searches. Our open source software MS-Clustering is designed to rapidly cluster large MS/MS datasets.

Main page     Downloads     Publications
MS-Clustering


MS-Dictionary

MS-Dictionary

Database search tools identify peptides by matching tandem mass spectra against a protein database. We study an alternative approach when all plausible de novo interpretations of a spectrum (spectral dictionary) are generated and then quickly matched against the database. We present a new MS-Dictionary algorithm for efficiently generating spectral dictionaries and demonstrate that MS-Dictionary can identify spectra that are missed in the database search. We argue that MS-Dictionary enables proteogenomic searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches. We show that such searches allow one to correct sequencing errors and find programmed frameshifts.

Main page     Downloads     Publications