An unprecedented wealth of data is being generated by genome sequencing
projects and other experimental efforts to determine the structure and
function of biological molecules. The demands and opportunities for
interpreting these data are expanding more than ever. Biotechnology,
pharmacology, and medicine will be particularly affected by the new
results and the increased understanding of life at the molecular level.
Bioinformatics is the development and application of computer methods
for analysis, interpretation, and prediction, as well as for the design
of experiments. It has emerged as a strategic frontier between biology
and computer science.
Machine learning approaches (e.g., neural networks, hidden Markov
models, and belief networks) are ideally suited for areas where there is
a lot of data but little theory--and this is exactly the situation in
molecular biology. As with its predecessor, statistical model fitting,
the goal in machine learning is to extract useful information from a
body of data by building good probabilistic models. The particular twist
behind machine learning, however, is to automate the process as much as
possible.
In this book, Pierre Baldi and Søren Brunak present the key
machine learning approaches and apply them to the computational problems
encountered in the analysis of biological data. The book is aimed at two
types of researchers and students. First are the biologists and
biochemists who need to understand new data-driven algorithms, such as
neural networks and hidden Markov models, in the context of biological
sequences and their molecular structure and function. Second are those
with a primary background in physics, mathematics, statistics, or
computer science who need to know more about specific applications in
molecular biology.
Pierre Baldi is Chairman of the Board, Net-ID, Inc. Søren Brunak
is Director, Center for Biological Sequence Analysis, The Technical
University of Denmark.