Speaker: Dan Gusfield
University of California, Davis
Monday, October 9, 2006
11:00 am - 12:00 pm
EBU3b 1202
ABSTRACT
A ``haplotype" is a DNA sequence that has been inherited from one
parent; each person typically possesses two haplotypes in any genomic
region. A mixed description of the two haplotypes is called a
``genotype", which usually does not uniquely determine the two
originating haplotypes. A major current focus in population genomics
is the collection of large-scale genotypic information in order to
detect genetic variations in the population. Currently, it is very
difficult and costly to collect large-scale haplotype data, but
relatively easy and cheap to collect genotype data. However, most of
the questions that we want to answer using population data are most
naturally framed in terms of haplotypes, not genotypes. This has lead
to the computational problem (the HI problem) of inferring a pair of
unobserved haplotypes that likely gave rise to the observed genotype of
each person in a sampled population, or to find partial information
about the underlying haplotypes. There is a large literature that
addresses the HI problem by using detailed statistical models of
haplotype evolution, but the semantics of the programs used in those
approaches are not always well defined, and the methods are often quite
slow. In contrast, in the combinatorial approach, one casts the HI
problem as an optimization problem with a relatively simple, precise
objective function. The challenges then are to find objective
functions leading to combinatorial optimization problems whose solution
gives the desired haplotype information, and to find efficient
algorithms (either in worst case or in practice) to solve these
optimization problems. In the past several years, my group at UC Davis
(and in partial collaboration with people elsewhere, including UCSD)
have studied roughly ten such objective functions which either have
polynomial-time solutions, or can be solved in practice (usually by
integer linear programming) on problems of sizes of current interest in
biology. These evolving formulations have sought to increasingly
capture biological complexity. Thus, the HI problem has been, and
continues to be, a rich source of many attractive combinatorial
optimization problems. In this talk I will discuss several of these,
and in particular, formulations that relate haplotype evolution to
questions about recombination.
BIO
Professor Gusfield's background is in Combinatorial Optimization, and
various applications of Combinatorial Optimization. He has worked
extensively on problems of network flow, matroid optimization,
statistical data security, stable marriage and matching, string
algorithms and sequence analysis, phylogenetic tree inference,
haplotype inference, and inference of phylogenetic networks with
homoplasy and recombination. He received his Ph.D. in 1980 from UC
Berkeley, working with Richard Karp, and was an Assistant Professor at
Yale University from 1980 to 1986.
Professor Gusfield moved to UC Davis in July 1986. Since then, he has
mostly addressed problems in Computational Biology and Bioinformatics.
He first addressed questions about building evolutionary trees, and
then problems in molecular sequence analysis. He presently focuses
mostly on optimization problems related to population genetics and
population-scale genomics. Two particular problems are haplotype
inference and inferences about historical recombination. His main
support for work on computational biology and bioinformatics came
initially from the Department of Energy Human Genome Project through
the Lawrence Berkeley Labs Human Genome Center, then directly from DOE,
Human Genome Project, but since then, his work in computational biology
has been funded by the NSF. His book, ``Algorithms on Strings, Trees
and Sequences: Computer Science and Computational Biology" has helped
to define the intersection of computer science and bioinformatics. It
has been translated into Russian, and a South Asian edition has been
published. Professor Gusfield serves on the editorial board of the
Journal of Computational Biology, and is the founding Editor-in-Chief
of The IEEE/ACM Transactions on Computational Biology and
Bioinformatics. The journal was presented the ``honorable mention" for
Best New Journal in 2004 by the American Association of Publishers.
Other notable service to the Computational Biology community consists
of serving as Program Chair for the 2004 RECOMB conference.
At UCD, Professor Gusfield was chair of the Computer Science
Department for four years, and wrote the bioinformatics section (one of
three) of the Genomics/Bioinformatics initiative proposal that resulted
in the creation of the UCD Genomics Center (which has hired 17 new
faculty), and continues to serve on its internal Steering committee. He
is currently co-chair of the UCD campus initiative on ``Computational
Characterization and Exploitation of Biological Networks" (see
cnb.ucdavis.edu), which will hire seven new faculty in this area over the
next three years.