Results of tests exploring the influence of tree shape on reconstructed tree distances.

shapeEffect

Format

A list of length 21. Each entry of the list is named according to the abbreviation of the corresponding method (see 'Methods tested' below).

Each entry is itself a list of ten elements. Each element contains a numeric vector listing the distances between each pair of trees with shape x and shape y, where:

x = 1, 1, 1, 1, 2, 2, 2, 3, 3, 4 and y = 1, 2, 3, 4, 2, 3, 4, 3, 4, 4.

As trees are not compared with themselves (to avoid zero distances), elements where x = y contain 4 950 distances, whereas other elements contain 5 050 distances.

Source

Scripts used to generate data objects are housed in the data-raw directory.

Details

For each of the four binary unrooted tree shapes on eight leaves, I labelled leaves at random until I had generated 100 distinct trees.

I measured the distance from each tree to each of the other 399 trees.

For analysis of this data, see the accompanying vignette.

Methods tested

  • pid: Phylogenetic Information Distance (Smith 2020)

  • msid: Matching Split Information Distance (Smith 2020)

  • cid: Clustering Information Distance (Smith 2020)

  • qd: Quartet divergence (Smith 2019)

  • nye: Nye et al. tree distance (Nye et al. 2006)

  • jnc2, jnc4: Jaccard-Robinson-Foulds distances with k = 2, 4, conflicting pairings prohibited ('no-conflict')

  • joc2, jco4: Jaccard-Robinson-Foulds distances with k = 2, 4, conflicting pairings permitted ('conflict-ok')

  • ms: Matching Split Distance (Bogdanowicz & Giaro 2012)

  • mast: Size of Maximum Agreement Subtree (Valiente 2009)

  • masti: Information content of Maximum Agreement Subtree

  • nni_l, nni_t, nni_u: Lower bound, tight upper bound, and upper bound for nearest-neighbour interchange distance (Li et al. 1996)

  • spr: Approximate SPR distance

  • tbr_l, tbr_u: Lower and upper bound for tree bisection and reconnection (TBR) distance, calculated using TBRDist

  • rf: Robinson-Foulds distance (Robinson & Foulds 1981)

  • icrf: Information-corrected Robinson-Foulds distance (Smith 2020)

  • path: Path distance (Steel & Penny 1993)

References

Bogdanowicz D, Giaro K (2012). “Matching split distance for unrooted binary phylogenetic trees.” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(1), 150--160. doi: 10.1109/TCBB.2011.48 .

Li M, Tromp J, Zhang L (1996). “Some notes on the nearest neighbour interchange distance.” In Goos G, Hartmanis J, Leeuwen J, Cai J, Wong CK (eds.), Computing and Combinatorics, volume 1090, 343--351. Springer, Berlin, Heidelberg. ISBN 978-3-540-61332-9 978-3-540-68461-9, doi: 10.1007/3-540-61332-3_168 .

Kendall M, Colijn C (2016). “Mapping phylogenetic trees to reveal distinct patterns of evolution.” Molecular Biology and Evolution, 33(10), 2735--2743. doi: 10.1093/molbev/msw124 .

Nye TMW, Liò P, Gilks WR (2006). “A novel algorithm and web-based tool for comparing two alternative phylogenetic trees.” Bioinformatics, 22(1), 117--119. doi: 10.1093/bioinformatics/bti720 .

Robinson DF, Foulds LR (1981). “Comparison of phylogenetic trees.” Mathematical Biosciences, 53(1-2), 131--147. doi: 10.1016/0025-5564(81)90043-2 .

Smith MR (2019). “Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets.” Biology Letters, 15, 20180632. doi: 10.1098/rsbl.2018.0632 .

Smith MR (2020). “Information theoretic Generalized Robinson-Foulds metrics for comparing phylogenetic trees.” Bioinformatics, online ahead of print. doi: 10.1093/bioinformatics/btaa614 .

Steel MA, Penny D (1993). “Distributions of tree comparison metrics---some new results.” Systematic Biology, 42(2), 126--141. doi: 10.1093/sysbio/42.2.126 .

Valiente G (2009). Combinatorial Pattern Matching Algorithms in Computational Biology using Perl and R, CRC Mathematical and Computing Biology Series. CRC Press, Boca Raton.