On the Consistency of Maximum Likelihood Estimation of Phylogenetic Trees from Nucleotide Sequences

Abstract

Felsenstein (1973, 1978, 1981) and other workers have advocated the method of maximum likelihood (ML) for estimating phylogenetic trees from discrete character data, particularly nucleotide sequences, in part because it is thought to be consistent, i.e., it will converge on the true tree as more and more data are accumulated. In contrast, the method of maximum parsimony is known to converge, under certain conditions, on the wrong tree as more data are added (Felsenstein, 1978). Felsenstein's (1973) argument for the consistency of the ML method was based on earlier work (Wald, 1949) that demonstrated that maximum likelihood estimation of statistical parameters, such as means, variances, etc., is consistent under a wide variety of conditions. This work also guarantees the consistency of the ML estimate of the branch lengths of a phylogenetic tree, given the correct tree topology and nucleotide substitution model. However, as pointed out by several workers (e.g., Nei, 1987:325; Saitou, 1988; Yang, 1994, 1996; Yang et al, 1995; Russo et al., 1996), these attributes do not guarantee the consistency of the ML method for estimating the tree topology. The usual ML method for estimating a tree involves finding the ML branch lengths for a given tree topology and substitution model, repeating the process for several to many other topologies, and then selecting the topology with the highest ML value as the best estimate of the true tree (Felsenstein, 1981). Yang (1994:329-330) wrote that

References

Page 1

	Year	Citations

Page 1