|
It is well known that phylogenetic trees can vary between genes.
Even within regions having the same tree topology, the mutation rates
often vary. This motivates the study of phylogenetic reconstruction
in heterogeneous settings. We study the (im)possibility of reconstructing
the underlying phylogeny when data is generated from a mixture of trees
(same topology, different branch lengths). We first show the pitfalls
of popular methods, including maximum likelihood and BMCMC algorithms.
We then determine in which evolutionary models, reconstructing the
tree topology, under a mixture distribution, is (im)possible. We
prove that every model either has ambiguous distributions, in which
case reconstruction is impossible in general, or there exist linear
tests which identify the topology. This duality theorem, relies on
our notion of linear tests and uses ideas from linear programming
duality. Linear tests are closely related to linear invariants,
which were first introduced by Lake. Joint work with Eric Vigoda.
|