UCSD Bioinformatics Research Projects

News

General Information:

This page has moved. The new page is at http://seed.ucsd.edu/research/.

This program is designed to help interested students find research projects having to do with bioinformatics to work on across campus.

There are plenty of opportunities for bioinformatics research projects. These projects are typically for credit, although in exceptional circumstances they can pay as well.

Participation in research projects can both significantly improve your chances of admittance into top graduate programs and make you a much more attractive employment candidate. Even better, it gives you something to talk about during an interview.

Feel free to contact us even if you do not know exactly whether or not you want to work on a research project or know the field you wish to research in.

Please remember that every undergraduate and masters student is welcome to participate in research regardless of your background or your year in the program. Undergraduates are STRONGLY encouraged to participate in research as early as possible in their careers. Ideally, you should start a research project during your sophomore year, but it is never too late or to early to start.

General Procedure:

If you are reasonably sure which project you would like to work on, use the contact information listed under the project to contact the person responsible for the project directly to set up a meeting. If you are not sure, but you are even slightly interested in research, feel free to email us, or drop in to help chose an appropriate project.

Most students take a project for course credit although in some cases funding may be available.

You can contact the project liason, Eleazar Eskin (eeskin@cs.ucsd.edu), if you have any questions.

Participating Faculty:


Legend:
Monetary Compensation May provide monetary compensation for highly qualified candidates
U Undergraduate Students 
G Graduate Students 




 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Computer-aided drug discovery
Contact: Andrew McCammon

Computational methods contribute in many ways to the discovery of new medicines. The McCammon group has pioneered the use of structural data on potential drug receptors to create computational models for the physical interactions of drug candidates and the receptors. This work contributed to the success of Agouron Pharmaceuticals in La Jolla, which has evolved into the Pfizer La Jolla Laboratories. Undergraduates in our group have gone on to rewarding careers as professors and as leaders in the pharmaceutical industry.

Required experience: To get the most out of this research, students should have a solid command of introductory college physics, mathematics through calculus of several variables, and introductory college chemistry and organic chemistry. Knowledge of physical chemistry is very helpful. See our website for more information, success stories of undergraduates from recent years, and application form: http://mccammon.ucsd.edu/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Structural biology of the cell
Contact: Andrew McCammon

Recent work in microscopy and other areas has provided a wealth of information about the structure and dynamics of cells. The McCammon group has pioneered the development of new computational methods to study cellular constituents and their physical interactions, starting with the new structural data. This work is beginning to show how cellular activity emerges from the underlying molecular behavior. Such studies will open the way to more sophisticated approaches for the discovery of new pharmaceuticals. Undergraduates in our group have gone on to rewarding careers as professors and as leaders in the pharmaceutical industry.

Required experience: To get the most out of this research, students should have a solid command of introductory college physics, mathematics through calculus of several variables, and introductory college chemistry and organic chemistry. Knowledge of physical chemistry is very helpful. See our website for more information, success stories of undergraduates from recent years, and application form: http://mccammon.ucsd.edu/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Analysis of the Transcriptome of Hirudo medicinalis
Contact: Prof. E. Macagno

The nervous system of the leech Hirudo medicinalis contains a small number of neurons (~190 pairs) in each segment that are responsible for all the sensory, analytic and motor functions in the segment. In order to begin to relate genetic and physiological programs at the level of individual neurons, we have begun a project to map the leech transcriptome and obtain gene expression profiles for each neuron. We have developed EST libraries and have a large amount of transcript sequence data that needs to be analyzed and organized in order to design microarrays to test with mRNA from identified individal neurons. Possible projects would include bioinformatics and wet biology components.

Requirements: some experience with computational sequencing tools and an interest in biological mechanisms.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Graphical Representation of Multiple Sequence Alignments
Contact: Degui Zhi or Ben Raphael

Multiple sequence alignment is the classic problem in bioinformatics. We recently developed a new method for multiple alignment that represents the alignment as a directed graph. Our method is applicable to both protein and DNA sequences, and is best suited to aligning sequences with repeated and/or shuffled elements. However, for our method to be useful to biologists, we need a striking visual representation of our alignment that conveys the important biological details in form that is easy to interpret. In this project, you will develop a tool that automatically generates this new alignment representation.

Requirements: Experience (or desire to learn) simple graphics programming in the computer language of your choice.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Benchmarking/comparison of multiple alignment algorithms
Contact: Degui Zhi or Ben Raphael

We recently developed a new method for multiple sequence alignment with several advantages over existing methods. We want to demonstrate the merits of our approach by designing carefully chosen, biologically realistic tests. In this project, you will develop such tests and show the strengths and deficiencies of several popular multiple alignment algorithms. From the knowledge gained from these tests, you may discover new ways to improve the algorithms that you can implement.

Requirements: For benchmarking, only basic knowledge of Unix/Linux command line. Experience in a scripting language (Unix shell, Perl, Python, etc.) is a plus. Experience in C programming is needed for implementing improvements in algorithm.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Computer-Based 3D Models of Steroid Receptors and Steroid Dehydrogenases
Contact: Michael Baker
Web page: http://medicine.ucsd.edu/faculty/mbaker/

This is a golden age for structure-function analysis of proteins due to the explosive growth of databases containing protein sequences (GenBank, etc.) and 3-dimensional structures (Brookhaven PDB) and the availability of powerful desktop computers and user-friendly software for analysis of protein sequences and structures.

My research focuses is on the regulation of steroid hormone action. I have several interesting projects on the structure of steroid receptors and steroid dehydrogenases. This research involves determining the 3D structure of enzymes and steroid receptors from known structures as templates, using software at the Supercomputer Center. The 3D model can then be used to identify the key amino acids that are involved in steroid hormone binding. This analysis also can elucidate the evolution of different specificities for steroids in receptors. For example, how did the specificity for progesterone and testosterone evolve from an ancestral progesterone/testosterone receptor?

Analysis of the steroid binding site can be used with the Catalyst software to identify compounds that can regulate the growth of estrogen-dependent breast cancer cells and androgen-dependent prostate cancer cells. It can also identify xenobiotics (DDT, bisphenol, etc.) that could bind to the steroid binding site.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Correlation Analysis of Changes in Structure/Function in Steroid Receptors during their Evolution
Contact: Michael Baker
Web page: http://medicine.ucsd.edu/faculty/mbaker/

Fish and mammalian steroid receptors have differences in steroid specificity. For example, the physiological progestin in fish does not bind to the human progesterone receptor or to other mammalian progesterone receptors. How did these changes evolve in progesterone receptors? When did these changes occur? Was it in a frog or bird or in an ancestral mammal?

The availability of sequences of steroid receptors in fish, frog, birds and mammals provides an opportunity to construct 3D models of their steroid receptors, which can be analyzed to determine how sequences and function change over 500 million years. Recently software has been developed that will analyze 3D structures, as well as sequence alignments, to locate amino acids that have co-evolved.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Exploiting Protein Fold Space
Contact: Prof. Philip E. Bourne

It is a remarkable fact that all of life is built from a very limited set of 3-dimensional parts . estimated at 1000-5000 unique protein folds. We are in the process of characterizing and learning from this amazing redundancy. For example, the adoption by an organism of a new fold either through random mutation or lateral gene transfer is clearly a major event as folds follow a power-law distribution . a fold may have few to many new functions. We have used this fold-impact hypothesis to build accurate phylogenetic trees. With this evidence and the knowledge that fold similarities can be measured in 3-D when weak sequence evidence cannot we are embarking on a variety of projects to better understand evolution using protein fold space. The Protein Data Bank (PDB), which my laboratory maintains, is the single worldwide repository for the structures of biological macromolecules and provides the data and tools for these studies. Further details of the laboratory can be found at http://www.sdsc.edu/pb.

Prerequisites: A good working knowledge of general biology and Java programming is prerequisite. A knowledge of structural biology and databases is desirable.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Developing a graphic visualization tool for intra-cellular signaling events.
Contact: Alexander Hoffmann

During the immune response, cells respond rapidly to external signals. These signals engage cell surface receptors, which trigger signaling events inside the cell that culminate in the activation of transcription factors that regulate gene expression. One transcription factor that is essential to the immune response is NF-.B, which is regulated by I.B proteins. We have developed a computational model that describes the dynamic interplay of three I.B proteins, I.B., -., and -.. that results in NF-.B........... (Science 298, pp.1741). Now we hope to gain further insight by developing a graphic animation of this process. Presenting these animations on a website will facilitate communication with other researchers.

Requires: Experience with graphics programming and animation and an interest in cellular signaling; preferably some knowledge in molecular and cell biology.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Protein Bioinformatics Projects
Contact: Milton Saier

Protein Bioinformatics Projects Available:
1, Conduct bioinformatic analyses to characterize novel familes of transport proteins.
2, Identify distant relationships between protein families to determine superfamily associations.
3, Map protein evolutionary pathways by identifying internal repeats and establishing homology.

These projects will require familiarity with a variety of computer programs, several of which were developed in our lab. No prior experience is required, but a student must commit to at least 15 hrs of research effort per week. Literature research is also required so that the student gains a complete knowledge of the published literature concerning the project at hand. If a student has software development experience, we also have a need to develop new programs for protein family and genome analyses. Finally, our lab is concerned with genome analyses of transporters and have developed software for this purpose. If this is a primary interst of a student, such a project could be arranged.

Requirements:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Alu Database Web Interface
Contact: Alkes Price Project no longer available.

The Alu repeat family is the most prolific repeat family in the human genome, and is believed to have played an important role in primate evolution. Our research on identifying Alu repeat subfamilies has identified many more subfamilies than were previously known, and has constructed an evolutionary tree of these subfamilies. We would like to produce a web tool based on our results. There would be two main parts. The first part would display the set of repeat subfamilies and their evolutionary tree in a user-friendly way. The second part would take as input the nucleotide sequence of a specific Alu repeat element, and output the consensus sequence of the subfamily to which it belongs.
Required: Experience with web programming
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Non Coding RNA Web Interface
Contact: Shaojie Zhang

The discovery of novel non-coding RNAs has been among the most exciting recent developments in Biology. Yet, many more remain undiscovered. We have developed a new software, FastR, to solve following problem: Given an RNA sequence with a known secondary structure, efficiently compute all structural homologs (computed as a function of sequence and structural similarity) in a genomic database.

We would like to produce a web server for our program. Users will upload the Stockholm file for the Query RNA and the Fasta file for the genomic sequences from the websites. FastR will run it on the server machine, and the results will display on the web or be sent to the users.

We would also ask the user open an account first.

In the future, It would be better keep the genomic database locally. And, ask the user select the related genomic sequences.

Required: Experience with web programming, Perl programming, C++.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Tumor Genome Visualization Tool
Contact: Ben Raphael

Tumor cells frequently have abnormal genomes. Gain and loss of whole chromosomes, duplications, deletions, translocations, inversions, and other chromosomal aberrations are common. Recent advances in sequencing technology allow us to obtain high-resolution, genome-wide information about these changes. However, the interpretation of this data requires the development of visualization tools. This project will develop such tools and use these tools to test various hypotheses about tumor genome architecture.

Required: Experience with graphics programming, preferably in a language that interfaces easily with the web (e.g. Java).

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Discriminative Graphical Models for Protein Sequence Analysis (joint project with Sanjoy Dasgupta)
Contact: Eleazar Eskin

Two recent advances in machine learning include kernel methods and graphical models. Kernel methods allow the use of very powerful discriminative learning techniques such as SVM to be applied to many types of classification problems. Graphical models allow modeling complex structures by providing an efficient way to represent the structure of the modeling problem. Many of the most important sequence analysis problems such as predicting structural features of proteins require discriminative techniques, yet have inherent structure. This project will develop algorithms for discriminative graphical models which are models which take into account the structure of the problem using a graphical model, but are trained using kernel methods. The first application will focus on learning how to segment sequences which will be applied to predicting structural domains of protein sequences.

Required Experience: Some experience with Machine Learning methods.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Embedding Sequences into Euclidean Spaces
Contact: Eleazar Eskin

Many of the most powerful machine learning methods are designed to apply to points in a Euclidean space. These techniques can not be directly applied to sequences which by their nature are discrete objects. In this project, we design Euclidean embeddings for sequences which allow for the application of these learning methods. These embeddings are applied to protein sequence analysis. See the following paper for more information:
http://www.cse.ucsd.edu/~eeskin/papers/drafts/hk-dist04.pdf

Required Experience: C, Perl programming experience.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Discovering the Genetic Basis of Human Disease
Contact: Eleazar Eskin

Humans differ by .1% of their genomes. Within this small amount of variation is encoded our genetic disposition to diseases such as hypertension. By examining populations of diseased and healthy individuals and their variation in genes known to be factors in the diseases we can identify specifically which variants correspond to the disease. This project develops techniques to identify the variation which corresponds to disease and develops statistics to back up the predictions. This project also develops techniques to predict the effect of the variation on the gene such as changing the structure of the protein product or affecting the regulatory structure. By reconstructing the phylogeny of the genes, we can predict the origin and history of the risk factor for the disease.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Statistical and Algorithmic Aspects of Motif Discovery
Contact: Eleazar Eskin

One of the fundamental problems of computational biology is motif finding or the discovery of over represented sequences. This problem has very important applications to the discovery of transcription factor binding sites. Most formulations of the motif discovery problem are NP-hard. Currently, many new types of data are being generated such as gene expression data and protein localization data which can help discover motifs. This project explores some new algorithmic directions in discovering motifs which incorporate many different data sources. This project also will explore statistics to verify motifs using this additional data. See the papers on my website under Motif Finding and Regulation:
http://www.cs.ucsd.edu/~eeskin/

Required Experience: Some background in statistics.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Promoter Discovery in Drosophila
Contact: Eleazar Eskin

A core promoter is a collection of binding sites which occur in a specific ordering and are involved in the transcription. In Drosophila, one of the most well studied organisms, only a fraction of the core promoters are known. This project will develop a system and statistical framework for the discovery of core promoters leveraging several previous works on identifying the structure of the promoter.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Promoter Modeling in Bacteria and Yeast
Contact: Eleazar Eskin

A promoter is a collection of binding sites or locations where transciption factors bind to DNA which regulate transcription. The promoter can be thought of as a logical circuit that defines how a gene functions. The logic of the circuit is encoded in the layout of the elements of the promoter. This project will develop a system and statistical framework for the discovery of the parts and modeling of promoters leveraging several previous works on identifying the structure of the promoter. One of the key techniques use will be comparative genomics where many different related bacteria and yeast genomes will be used to discover functional parts of the promoters.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Regulatory Aspects of Human Disease
Contact: Eleazar Eskin

Complex diseases have many genetic factors which influence the likelihood of contracting the disease. Many of these genetic factors are single nucleotide polymorphisms (SNPs) that occur in the regulatory region of promoter of genes that are known to be implicated in the disease. This project attempts to model the human promoter and understand how the SNP affects the functioning of the promoter. This project leverages several recent works on modeling of promoters.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 De novo classification of repeats in worm genomes
Contact: Haixu Tang Project no longer available.

C. elegans genome is known to have several families of DNA transposonsi, such as Tc1, Tc2, Tc3, Tc6 and Tc7. Some of them are still active. However, the systematic characterization and classification of the repeats in worm genome remains open problem. Moreover, the preliminary analysis of the recent published second worm genome C. briggsae revealed some new repeat families that are not present in the C. elegans genome. The goal of this project is to apply the recently developed software tools, including RECON and our own program RepeatGluer to de novo classify the repeat families in worm genomes and compare the repeat elements in these two genomes.

Requirements: Experience with Unix and script languange; some experience with molecular biolgy.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Characterization of alternative splicing in C. elegans
Contact: Haixu Tang Project no longer available.

Although the alternative splicing is recognized as an important gene regulatory machanism for vertebrates, there is no systematic analysis of the alternative splicing events in {\it C. elegans}. Using the published genomic sequence and the fast expanding EST (Expressed Sequence Tag) data, we would like to identify the potential alternative spliced genes in the worm.

Requirements: Experience with Unix and script languange; experience with BLAST and other sequence comparison tools; some experiences in molecular biolgy.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Strategies for comparing GC-biased genomic sequences
Contact: Haixu Tang Project no longer available.

Most of the current genome comparison algorithms are designed for comparing DNA sequences with an even distribution over four types of nucleotides. For example, as a typical fast seed-extension alsorithm, BLAST (and other BLAST variants, e.g. BLASTZ, MEGABLAST) design the seed with a normal hashing function without considering special background nucleotide distribution. However, there are some organisms adopt very biased GC content, e. g. Dictyostelium discoideumi with GC-content about 80%.

We want to devise new strategies to compare the sequences from such organism. The potential applications include: (1) the overlap detection phase in fragment assembly of such genome; (2) compare EST (Expressed Sequence Tag) sequences with the genomic sequences; (3) BLAST-like genomic searching; (4) genome-genome comparison.

Requirements: Experience with programming with C/C++

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 An informatics approach to compensatory mutations
Contact: Steven Wasserman

Our understanding of the flow of information through cellular networks relies critically on knowledge of pathway architecture. A wide range of techniques can place gene products in a common process or reveal that one protein has the capability to bind to another. It is in general far more difficult, however, to prove that two particular proteins have a direct and functional interaction inside the cell. One of the few ways of providing precisely such a demonstration relies on the phenomenon of compensatory mutations. The starting point is typically an inactivating mutation in protein X. One then seeks a mutation in Y that suppresses the defect caused by the protein X mutant, but that by itself inactivates protein Y. If the wild-type proteins X and Y can physically interact and the effects of the mutations are specific for this protein pair, the existence of the compensatory mutations offers extremely strong evidence that the binding of X and Y is required in vivo for the process under study.

Although the compensatory mutation approach is straightforward in concept, it is not commonly applied. The biggest reason is probably the judgment that success in isolating such mutations requires more starting information than is generally available or more luck than one is apt to encounter. It is certainly true that it is much easier to generate compensatory alleles if the structure of both proteins is known. It is also true that multiple mutations will often have to be sampled to find a compensatory pair. However, with the ever-increasing number of conserved and well-defined structural domains and with methodologies available for querying biological function for many samples in parallel, the prospects appear promising for the development of a large-scale compensatory mutation approach.

The following project is designed to lay the foundation for the increased application of compensatory mutations to the dissection of biological pathways.

1) Catalog examples of compensatory mutations in the literature. There is no single common language used to describe compensatory mutant pairs in published articles. Terms such as, "compensatory," "suppressor," "allele-specific," and "intergenic," are each used sporadically and are often absent from the title, abstract, and keyword lists. Thus the first stage of this project will require iterative full-text searching, with the development of secondary filters and rescreening procedures being required to get a comprehensive list of examples in the literature.

2) Define rules governing compensatory pairs By examining the documented examples, a number of specific questions can be addressed: What types of mutations have been used as starting points and what types have been found as suppressors? Are sites that yield compensatory alleles typically well conserved? Do mutations altering charge tend to be compensated by changes of opposite character? The answer to these and related questions can be extracted from the literature and used to address more general questions: Can one derive rules for making compensatory pairs? Can one use these rules to devise an informatics-based protocol for identifying compensatory mutations in pathways of interest? Is such a protocol feasible for use in a large-scale, systems approach?

Part 1 of this project is likely to form the basis for a review article. Part 2 provide the outline for pilot projects for experimental verification of protocols proposed. Both parts have the potential to have a substantial impact on research in biological pathways and networks

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Analysis of Repetitive Sequences Near Cancer Breakpoints
Contact: Ben Raphael

Tumor cells frequently have abnormal genomes. Recent advances in sequencing technology allow us to identify locations in the human genome (breakpoints) of genome rearrangements in tumor cells. It has been suggested that repetitive DNA sequences might increase the likelihood of or even directly cause these genome rearrangements. In this project, we will analyze the repeat content of breakpoint regions, and determine if there is an association between repetitive sequences and genome rearrangements in cancer.

Required: Experience with a scripting language, e.g. Perl.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Batch Queue Daemon
Contact: Neil Jones Project no longer available.

Computational needs in our lab have outstripped the available resources, but only by a thin margin. The primary problem that needs to be addressed is that different people need to coordinate access to a limited supply of computational resources in an efficient way. Some of these jobs are parallel and span many nodes, while others use one particular node with a large memory. Batch queue processors have been designed and built before; we would like to reuse an existing piece of software to fulfill our needs. This project would be relatively simple: based on our requirements, find an open-source batch scheduler, test it, and provide reasonable instructions on how to use it.

Required Experience: Some experience with programming and system administration.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Reconstruction of the human mitochondria
Contact: Thuy Vo

Diverse datasets, including genomic, proteomic, isotopomer, and DNA sequence variation, are becoming available for human mitochondria. The isotopic labeling method can be used to profile carbon redistribution from input substrates (for example, glucose) to other metabolic intermediates, thereby allowing one to calculate fluxes of internal reactions. Our current mitochondria study focuses on collecting and developing a mathematical method to incorporate these isotopomer data into the existing mitochondrial network.

Required Experience: I would like a student who has taken either genetics or biochemistry. Knowledge of Matlab and Perl is also a plus. Specifically, you will be working closely with me to collect and interpret available experimental data. Those who are interested in learning about bioinformatics and modeling can also work with me on converting these data into a mathematical formulation of the biological problem in a quality-control manner. Overall, this project will help you learn "what it means to do research". I prefer a student who can work at least 10 hrs/week.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Automated incorporation of experimentally-derived rules into the existing E. coli regulatory network
Contact: Christian Barrett Project no longer available.

The lab is currently discovering rules of transcriptional regulation in E. coli through iterative laboratory experiments using knockout strains, growth environment shifts, and expression profiling. These rules are in the form of Boolean logic statements. After each experimental iteration, the generated rules need to be incorporated with the rules that already exist in the transcriptional regulatory network reconstruction. Since the rules are in a Boolean framework, methods from logic circuit design and minimization will probably find direct applicability. Each experimental iteration generates on the order of many hundreds of regulatory rules. Manual incorporation of these rules into the reconstruction is very tedious and slow. Furthermore, the project may entail development of methods to test the reconstruction after incorporation of the regulatory rules.

Required Experience: A background in computer science/computer engineering is preferred. The student should have familiarity with basic ideas of logic circuit design and minimization (e.g. Karnaugh Maps, Quine-McCluskey method). Software development is expected to be done using Python, Jython, or Java (not Perl).

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Development of tools for 3-D visualization of macromolecular interactions
Contact: Timothy Allen

The overall goal of this project is to develop an in silico platform for graphically visualizing the spatial / volume-exclusion constraints on the assembly and interaction of the protein synthesis machinery in bacteria. Since the field of protein structural visualization is fairly advanced, it will probably be preferable to implement pre-existing software to visualize and manipulate the possible interactions of ribosomes, mRNAs, tRNAs, tRNA synthetases, and elongation factors. Thus, the student should ultimately be able to gain insights into how macromolecular volume exclusion constraints might impact translation in bacteria.

Required Experience: Preferred qualifications include knowledge of biochemistry and the molecular structure of biological macromolecules and knowledge of basic molecular biology (particular a solid understanding of prokaryotic translation). The student must also have the willingness to proactively contact leaders in the field of molecular visualization (such as David Goodsell at the Scripps Research Institute) to learn what tools are currently available, and to discuss how existing tools might be implemented for studying translation. Strong computer science background (especially regarding 3-D visualization, graphics, and rendering) would also be useful. Most importantly, the student must have the desire to take on an "open ended" project which involves clear goals in methods development and tool implementation, but whose scientific value will be largely of an exploratory and educational nature. The expected time commitment is at least 10 hours/week and regular meetings with project manager (at least weekly, and more often as needed).

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Developing complex-formation reactions for E. coli
Contact: Jennifer Reed

The project is primarily a bioinformatics project. Gene products come together to form protein-complexes that are capable of carrying out metabolic reactions. The genes associated with these complexes are included in the latest E. coli model; however, the stoichiometry of the enzyme complexes has not been accounted for. For example, two copies of the same protein might come together as a dimer. This information is important in the future development of our E. coli model. This project will involve taking information from an on-line database and some literature searches to generate complex formation reactions (eg. (2) geneA + (3) geneB -> ComplexAB).

Required Experience: Preferred qualifications include knowledge of some programming language (perl, java, etc.), and a basic understanding of biology and biochemistry. The student will be expected to work at least 10 hrs/wk and have weekly meetings with the Project Manager.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Development of a compartmentalized yeast model
Contact: Natalie Duarte

Commonly known as baker's yeast, S. cerevisiae is an important model organism because of the similarity between its basic cellular functions and those of higher eukaryotes. For this reason, a model of yeast metabolism will lay the groundwork for the future development of models specific to human metabolic diseases. Our metabolic model of S. cerevisiae includes 750 genes, whose products catalyze reactions in 8 different cellular compartments, including the cytosol, mitochondria, and peroxisome. We need a well-organized student with coursework in metabolic biochemistry to help us gather evidence about the existence and location of each gene product.

Required Experience: To complete this project, the student should have a basic understanding of eukaryotic metabolism and be comfortable interpreting experimental results from published gene deletion studies and enzyme activity assays. This project is ideal for anyone who wants to "dig into the details" of one of our largest genome-scale networks. It would be a great starting point for students who are interested in continuing in the lab for more than one quarter.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Comparison of genome-scale localization data sets
Contact: Natalie Duarte

A defining feature of eukaryotic cells is compartmentalization. Each compartment contains its own defined set of proteins and other molecules, allowing for specialized processes to be carried out exclusively within these locations. The localization information in our current yeast metabolic model is primarily based on the literature references from the Comprehensive Yeast Genome Database (CYGD) and Saccharomyces Genome Database (SGD). However, several large-scale localization studies have been published since this reconstruction (Huh, Nature 2003; Kumar, Genes Dev 2002). The goal of this project will be to reconstruct the yeast metabolic network based on these new localization studies and compare its content and predictions to the current model.

Required Experience: Coursework in molecular biology and biochemistry is required. This project will require the SimPheny software package and therefore must be completed using computer in Dr. Palsson's lab.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Evolutionary sequence comparison and practical bioinformatics support
Contact: Gerard Manning

The flood of genome sequences promises to unlock the billion-year story of how function has shaped protein and DNA sequence. We are using the protein kinase superfamily (see http://kinase.com) as a model to look at sequence evolution and what it can tell us about protein function. Several sub-projects are available, including the determination of protein kinase diversity in several recently-sequenced genomes, and the development of tools to extract functional information from deep sequence alignments.

Required Experience: Understanding of the fundamentals of molecular biology and experience in protein sequence analysis. Programming or scripting experience highly recommended.

This position is paid, and in return, will involve at least 40% support work and collaboration with other Salk labs. This aspect will give exposure to a wide range of research areas, practical bioinformatics problems and their efficient resolution. See http://salk.edu/career/openings/staff.php for more details.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Smart Vivarium
Contact: Serge Belongie

The Smart Vivarium project aims to equip lab animal cages with cameras to provide continuous monitoring of behavior and health conditions using computer vision and pattern recognition techniques. Our goal is to improve the life history of animals used in medical research and to make new scientific discoveries possible. The project team is interdisciplinary, with members from CSE, BioEng, and the Animal Care Program.

Possible projects include: creation of video clip database of mouse behaviors, assisting in algorithm development, real time implementation, data mining.

Required Experience: experience in one or more of the areas of computer vision, image processing, machine learning, embedded system design, distributed computation, animal behavior/animal psychology, Matlab programming, Adobe Premiere video editing.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Tandem mass spectrometry (MS/MS)
Contact: Vineet Bafna or Stephen Tanner

Tandem mass spectrometry (MS/MS) analysis of peptides is a high-throughput way to detect the presence of peptides. Database search tools such as Sequest and MASCOT are used in interpreting the large volumes of data. Our lab has developed InsPecT, a next-generation search tool which can effectively identify peptides in the presence of post-translational modifications (PTMs) and mutations. We are collaborating with other labs to apply InsPecT to detect phosphorylation sites, identify peptides using a database of homologous proteins, and other investigations.

A student researcher will use InsPecT to analyze data-sets and summarize the relevant results. Scripts are available to help automate much of the process, but maintenance and extension of these scripts and the InsPecT webserver will be required.

Required Experience: Experience writing Python or Perl scripts on Unix or Windows. Experience in web programming is a plus.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Genomic features of Dictyostelium Winter 2006
Contact: Sam Payne

The dictyostelium genome has recently been finished and published. The work of genome annotation has been started in the lab. We are looking for students to explore novel genome features of transcriptional regulation, not just limited to finding DNA binding motifs, but looking for various information signals in the genome.

Required Experience: knowledge of perl, programming basics, and general biology are required. A knowledge of genome features is desirable.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 




Winter 2006 Bioinformatics course web tools
Contact: Neil Jones Project no longer available.

The undergraduate bioinformatics program is new, and many of the core courses have been taught for the first time within the last year. While every effort is being made to standardize and plan the curriculum, the administration of the classes could also be made consistent. A number of freely-available software suites can provide useful features to a class, but selecting and installing one incurs too high an overhead for instructors and TAs that teach one class per year. We would like to implement a website that can be used by any faculty at UCSD to teach bioinformatics.

The website would include courseware, filtered and annotated resources, new materials (e.g., flash animations of algorithms), and a variety of administrative modules to ease the creation and administration of a new bioinformatics course. There are several technical challenges hidden in this project; for example, only the faculty member that teaches a class can have access to the class student list, yet a centralized service needs to be able to authenticate them (i.e., log them on).

Required Experience: Technical savvy and an interest in bioinformatics education; this is an opportunity to learn, hands-on, how to build websites.

Compensation: Depending on availability, this project may qualify for funding.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Demonstrating a thymine reductive pathway in E. coli
Contact: Jennifer Reed Project no longer available.

Using our models of E. coli we have hypothesized that a thymine reductive pathway is present in the organism. We have putatively identified the responsible genes and want to perform experiments characterizing this pathway. The first stage of the project will involve growing wildtype E. coli and a number of knockout strains on thymidine and thymine, observing whether or not the strains are able to grow. The next stage of the project will involve characterizing gene products resulting from stage one, by overexpressing, purifying, and biochemically testing the associated enzymes.

Required Experience: Preferred qualifications include wet lab experience, BIBC 103 (Biochemical Techniques lab), BIMM 101 (Recombinant DNA Lab), basic knowledge of biology and biochemistry. The student will be expected to work at least 10 hrs/wk and have weekly meetings with Project Manager.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Computational Analyses of Transport Systems
Contact: Milton Saier

Our laboratory has developed the continually expanding Transporter Classification (TC) system (see http://www-biology.ucsd.edu/~msaier/transport/) which has been adopted by the International Union of Biochemistry and Molecular Biology (IUBMB) as the internationally acclaimed system for classifying transport proteins. It's availability allows students in our lab to conduct bioinformatic research on transport proteins with emphasis on understanding their evolution and structure/function relationships. For this purpose we have a need for programmers as well as students who apply existing programs. We study the phyolgenetics of current members of families, examine the proteins for internal duplications as an indication of their routes of origin, try to determine their topologies using a variety of programs, and conduct motif/sequence analyses in order to establish superfamily relationships between families.

Required Experience: None; a basic knowledge of Biochemistry is useful but can be learned after entry into the lab. Programming skills are useful, but not essential.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Antibody Diversity in Leukemia
Contact: Brad Messmer or
Winter 2006 Discovering new micro RNAs in Arabidopsis
Contact: Julian Schroeder

miRNAs (microRNAs) are short ~21 nucleotide single stranded RNA molecules that bind to mRNA to inhibit translation or degrade the target mRNA. miRNAs have been found to play a key role in post-transcriptional gene regulation and their discovery is likely to lead to awarding of a Nobel prize. The project would include advanced analyses of microarray experiments, developing algorithms for a genome wide search of new miRNAs and their gene targets.

Required Experience: Experience with some programming language (C,Perl,Python, C++ etc.)

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Simulating molecular movement within the Cell: A learning tool for high school students
Contact: Scott Baden

Cells rely on molecular movement (diffusion) to communicate information inside the cell and also with the surroundings. For example, small packets of molecules are released from one neuron (nerve cell) to signal to another neuron.

The Sejnowski and Baden research groups collaborate on large scale simulations of cell function based on the work of Tom Bartol and Joel Stiles on the MCell simulator. In this project, students will develop a simulator that will enable high school students to learn about how molecular movement works in the Cell.

This will be a team effort, and involves students with diverse backgrounds in computer graphics, physics, chemistry, visual arts, human computer interaction, and cell biology. See the MCell-K web site for more information

More information is avaiable at http://www.cs.ucsd.edu/groups/hpcl/scg/mcellk.



 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Web services for clique detection algorithm
Contact: Philip Bourne

Clique detection algorithms has many applications in bioinformatics and chemoinformatics. There is an efficient implementation of a clique detection algorithm available in the C programming language. To make it interoperable in different language environment and operating system, it is desirable to have a web service wrapper for this software.

Required Experience: Experience with the Java programming language

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 Graph representation of the protein functional site
Contact: Philip Bourne

Protein functional site comparison is critical in predicting protein functions from structural genomics targets and linking protein and chemical space. It is a promising approach to apply graph algorithms to functional site comparison. At the first step, a functional site should be represented as a graph with evolutionary, physical and geometric properties encoded. Furthermore, an associated graph can be constructed by linking to two functional site graph representations.

Required Experience: Experience with some programming language (C, C++, Java etc.)

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Winter 2006 New algorithm for small molecule similarity searching
Contact: Philip Bourne

Small molecule similarity searching is an essential tool in virtual ligand screening for drug discovery. Conventional small molecular searching is based on either fingerprints that encodes global properties of a molecule or sub-graph detection that depends on the atomic environment. The fundamental limitation of current methods is that the molecule similarity is not correlated well with the protein-ligand binding activity. In this project, the student will design and implement a new algorithm that segments small molecules based on functional groups and combines fingerprint and sub-graph detection with the aim to improve the correlation between ligand similarity and protein-ligand interaction.

Required Experience: Experience with some programming language (C, C++, Java etc.)