Fall04: CSE/BIMM/BENG 182: Biological Data Analysis


Instructor: Vineet Bafna

TAs:
Ali Bashir ,
Max Alekseyev

Lectures: TR 5:00-6:20pm. CENTR 203
Discussion: M 3:00-3:50pm CENTR 207

Office hours:
Vineet Bafna: TR 12:45pm-2:00pm. APM3832
W4-6pm

TAs Office hours are available per request basis. To request a meeting please send TAs an e-mail message (at least a day before) with time frames convenient for you and a brief description of the topic you would like to discuss.

Course Information:

MIDTERM: TBD
Sample Questions


ASSIGNMENTS
Assignment Due date Data
A1
Note: For problem 1, run with the following parameters: match:1, mismatch:-3, indel: -2
10/16 for Problem 1: human.seq and mouse.seq
for Problem 5: two sequences
A2
Note: For problem 2, Scoring Matrix has been corrected. Please re-download.
For problem 3, subset F' should be as large as possible.
11/2 Family F
Family F2
Database D
Scoring Matrix (CORRECTED)
A3
Note: This assignment is optional, and is for students who did not score well on A1 and A2.
12/7 Problem1: Data File 1
Problem 1: Data File 2
Problem 2 (updated 11/28)

PROJECTS
Project Due date Training Data Test Data
Project Description C1:11/2
C2:11/15
C3:11/22
C4:11/29
Data File (zipped)
Annotated and unannotated spectra are in the folders "Labeled Spectra" and "Unlabeled Spectra"
PhosphoSpectra contain the same spectra modified by phoshphorylation(s) (For Problem 8)
The oracle files identify the peptides corresponding to these data-sets
Some problems, such as the isotope peak calculation do not need a data set

Lectures
There is no required text for the course. We will use Jones and Pevzner, "An Introduction To Bioinformatics Algorithms", MIT Press, as an optional book.
Future recommended reading is subject to change with little notice. Please note that the available manuscripts are copyright protected, and may be used only for educational purposes. The notes presented here are unedited, and may contain errors. Powerpoint slides are used only to illustrate examples in class, and are not intended to substitute lecture notes.

Lecture Topic Slides Suggested Reading
9/23 Course outline L1 Perl 5 guide
Bioinformatics Algorithms web-site
Chap 3 has a brief introduction to Molecular Biology
9/28 Sequence Alignment tour L2 Dyn. programming notes.
Also see Jones & Pevzner
9/30 BLAST: Alignment Scores etc. L3 PAM vs. BLOSUM matrices
10/5 BLAST: Sensitivity versus Speed
P-value computation
L4 Blast P-value
BLAST Home
Significance of sequence search results: Distributions and p-values
10/7 Dictionary matching
Profiles
Psi-BLAST
L5 Pattern Matching
Psi-BLAST
Profiles
10/12 Regular Expression Search
Protein Structure basics
L6 ExPASy tools
PROSITE
10/14 Mass Spectrometry Basics L7
10/19 Mass Spectrometry
De novo sequencing
Applications
L8 Protein Prospector
10/21 HMMs
Introduction (comparison to profiles)
Viterbi Algorithm
L9 HMM Notes
10/26 HMMs
Forward-Backward Algorithm
Applications (Profile HMMs/CpG island)
L10 Chapter 11, Jones and Pevzner
10/28 Gene Finding
Different approaches to gene finding
Gene Finding HMMs
Splice site prediction
L11 Genscan: Burge & Karlin
11/2 Gene Finding
Lander-Waterman statistics
L12 Lander Waterman paper
11/4 Lander-waterman statistics
Genome Assembly
L13 Arachne
11/9, 11/12 Mid-term
Veteran's day
11/16 Population Genetics
Mutation, Recombination
Perfect Phylogeny
L14
11/18 Population Genetics: Population Structure L15 Structure
Human Population Structure
11/23 non-coding RNA
RNA structure
L16 MiRscan
11/28 Guest lecture on pouplation genetics

Research:
We are always looking for motivated students. If you are interested in exploring undergraduate research opportunities in Computational Biology, please email me.