Instructor: Vineet Bafna
TAs:
Ali Bashir ,
Max Alekseyev
Lectures: TR 5:00-6:20pm. CENTR 203
Discussion: M 3:00-3:50pm CENTR 207
Office hours:
Vineet Bafna: TR 12:45pm-2:00pm. APM3832
W4-6pm
TAs Office hours are available per request basis. To request a meeting please send TAs an e-mail message (at least a day before) with time frames convenient for you and a brief description of the topic you would like to discuss.
Course Information:
MIDTERM: TBD
Sample Questions
ASSIGNMENTS
Assignment | Due date | Data |
---|---|---|
A1 Note: For problem 1, run with the following parameters: match:1, mismatch:-3, indel: -2 | 10/16 | for Problem 1: human.seq and mouse.seq for Problem 5: two sequences |
A2 Note: For problem 2, Scoring Matrix has been corrected. Please re-download. For problem 3, subset F' should be as large as possible. | 11/2 | Family F Family F2 Database D Scoring Matrix (CORRECTED) |
A3 Note: This assignment is optional, and is for students who did not score well on A1 and A2. | 12/7 | Problem1: Data File 1 Problem 1: Data File 2 Problem 2 (updated 11/28) |
PROJECTS
Project | Due date | Training Data | Test Data |
---|---|---|---|
Project Description | C1:11/2 C2:11/15 C3:11/22 C4:11/29 | Data File (zipped) Annotated and unannotated spectra are in the folders "Labeled Spectra" and "Unlabeled Spectra" PhosphoSpectra contain the same spectra modified by phoshphorylation(s) (For Problem 8) The oracle files identify the peptides corresponding to these data-sets Some problems, such as the isotope peak calculation do not need a data set |
Lectures
There is no required text for the course. We
will use Jones and Pevzner, "An Introduction To Bioinformatics
Algorithms", MIT Press, as an optional book.
Future recommended reading is subject to change with little
notice. Please note that the available manuscripts are copyright
protected, and may be used only for educational purposes. The notes
presented here are unedited, and may contain errors. Powerpoint slides
are used only to illustrate examples in class, and are not intended to
substitute lecture notes.
Lecture | Topic | Slides | Suggested Reading |
---|---|---|---|
9/23 | Course outline | L1 | Perl 5 guide Bioinformatics Algorithms web-site Chap 3 has a brief introduction to Molecular Biology |
9/28 | Sequence Alignment tour | L2 | Dyn. programming notes. Also see Jones & Pevzner |
9/30 | BLAST: Alignment Scores etc. | L3 | PAM vs. BLOSUM matrices |
10/5 | BLAST: Sensitivity versus Speed P-value computation | L4 | Blast P-value BLAST Home Significance of sequence search results: Distributions and p-values |
10/7 | Dictionary matching Profiles Psi-BLAST | L5 | Pattern Matching Psi-BLAST Profiles |
10/12 | Regular Expression Search Protein Structure basics | L6 | ExPASy tools PROSITE |
10/14 | Mass Spectrometry Basics | L7 | |
10/19 | Mass Spectrometry De novo sequencing Applications | L8 | Protein Prospector |
10/21 | HMMs Introduction (comparison to profiles) Viterbi Algorithm | L9 | HMM Notes |
10/26 | HMMs Forward-Backward Algorithm Applications (Profile HMMs/CpG island) | L10 | Chapter 11, Jones and Pevzner |
10/28 | Gene Finding Different approaches to gene finding Gene Finding HMMs Splice site prediction | L11 | Genscan: Burge & Karlin |
11/2 | Gene Finding Lander-Waterman statistics | L12 | Lander Waterman paper |
11/4 | Lander-waterman statistics Genome Assembly | L13 | Arachne |
11/9, 11/12 | Mid-term Veteran's day | ||
11/16 | Population Genetics Mutation, Recombination Perfect Phylogeny | L14 | |
11/18 | Population Genetics: Population Structure | L15 | Structure Human Population Structure |
11/23 | non-coding RNA RNA structure | L16 | MiRscan |
11/28 | Guest lecture on pouplation genetics |
Research:
We are always looking for motivated students. If you are interested in
exploring undergraduate research opportunities in Computational
Biology, please email me.