Computational Complexity in the Life Sciences
Christos H. Papadimitriou
Computer Science Dept., Univ. California - Berkeley
Martha Sideri
Computer Science Dept., Univ. California - Berkeley
Abstract
The field of computational complexity has been
investigating over the past three decades the reasons why
some computational problems are so hard to compute. But it
has more fascinating lessons to teach us than this, because
many problems in science have latent algorithmic aspects.
Among those, the {em protein folding problem} is one of the
most intriguing. Proteins are polymer chains consisting of
monomers of twenty different kinds, which tend to {em
fold}, presumably by dint of attraction or repulsion forces
between monomers, to form a very specific and stable
geometric pattern, known as the protein's {em native
state}. It is this geometric pattern that determines the
macroscopic properties, behavior, and function of a protein.
This surprising stability of the native state has led to the
widespread belief that it must be the lowest-energy
configuration of the chain. Thus Nature appears to be
solving very rapidly an extremely complex combinatorial
problem (a widely-studied discretized version of the protein
folding problem has in fact recently proved NP-complete (cf.
{em Proceedings of the 1998 RECOMB Conference/}). This
conundrum is known as {em Levinthal's paradox}. However, a
simple explanation proceeds along these lines: Proteins
must cooperate in order to function in an organism, and such
cooperation often involves ``locking'' of their shapes.
Hence, there is evolutionary pressure towards protein forms
that have a unique stable native state, and this pressure
could have resulted in proteins with very {em flat energy
landscapes.} Flat landscapes feature an overwhelmingly
popular local optimum, which may not necessarily coincide
with the global optimum. Computational experiments verify
that such flat landscapes evolve very rapidly in a broad
variety of optimization problems and circumstances.
Protein folding is only one of the mysterious steps in the
map from genotype to phenotype. There have been exciting
results in recent years linking certain human diseases to
specific genes. Some geneticists envision a coming ``last
stage of the Mendelian revolution,'' in which all such
macroscopic traits will be traced to their genetic causes.
However, the vast majority of traits and diseases appear to
be {em polygenic}, in that they involve the complex
interactions, as in a many-input Boolean circuit, of many
genes. There seem to be unsurmountable obstacles of
computational complexity lying in the path of this ambitious
and important research project.