distanceDistribution25(/50)
are two-dimensional matrices listing the
normalized distances between random pairs of bifurcating trees with 25 and
50 leaves drawn from the uniform distribution using
TreeTools::RandomTree()
(data object randomTreePairs25
(/50)
).
pectinateDistances11
reports distances between a pectinate 11-leaf tree
and 100 000 random binary trees.
distanceDistribution25 distanceDistribution50 pectinateDistances11
Objects of class matrix
(inherits from array
) with
23 rows, each corresponding
to a tree distance method and is named with its abbreviation
(listed in 'Methods tested' below), and
10 000 (distanceDistribution25/50
)
or 100 000 (pectinateDistances11
)
columns, listing the calculated distances between each pair of trees.
An object of class matrix
(inherits from array
) with 23 rows and 10000 columns.
An object of class matrix
(inherits from array
) with 23 rows and 10000 columns.
An object of class matrix
(inherits from array
) with 22 rows and 100000 columns.
Scripts used to generate data objects are housed in the
data-raw
directory.
pid
: Phylogenetic Information Distance (Smith 2020), normalized
against the phylogenetic information content of the splits in the trees
being compared.
msid
: Matching Split Information Distance (Smith 2020), normalized
against the phylogenetic information content of the splits in the trees
being compared.
cid
: Clustering Information Distance (Smith 2020), normalized
against the entropy of the splits in the trees being compared.
qd
: Quartet divergence (Smith 2019), normalized against its maximum
possible value for n-leaf trees.
nye
: Nye et al. tree distance (Nye et al. 2006), normalized against
the total number of splits in the trees being compared.
jnc2
, jnc4
: Jaccard-Robinson-Foulds distances with k = 2, 4,
conflicting pairings prohibited ('no-conflict'), normalized against
the total number of splits in the trees being compared.
jco2
, jco4
: Jaccard-Robinson-Foulds distances with k = 2, 4,
conflicting pairings permitted ('conflict-ok'), normalized against
the total number of splits in the trees being compared.
ms
: Matching Split Distance (Bogdanowicz & Giaro 2012), unnormalized.
mast
: Size of Maximum Agreement Subtree (Valiente 2009), unnormalized.
masti
: Information content of Maximum Agreement Subtree, unnormalized.
nni_l
, nni_L
, nni_t
, nni_U
, nni_u
:
Lower, best lower, tight upper, best upper, and
upper bounds for nearest-neighbour interchange distance (Li et al. 1996),
unnormalized.
'Best' lower bounds jump sharply when mismatched regions of a tree become
large enough that a tight upper bound cannot be exactly calculated, so
are discontinuous and cannot readily be compared between trees.
spr
: Approximate subtree prune and regraft SPR distance,
unnormalized.
tbr_l
, tbr_u
: Lower and upper bound for tree bisection and reconnection
(TBR) distance, calculated using
TBRDist; unnormalized.
rf
: Robinson-Foulds distance (Robinson & Foulds 1981), unnormalized.
icrf
: Robinson-Foulds distance, splits weighted by phylogenetic
information content (Smith 2020), unnormalized.
path
: Path distance (Steel & Penny 1993), unnormalized.
mafi
(pectinateDistances11
only): information content of the
maximum agreement forest (Smith 2020).
Bogdanowicz D, Giaro K (2012). “Matching split distance for unrooted binary phylogenetic trees.” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(1), 150--160. doi: 10.1109/TCBB.2011.48 .
Li M, Tromp J, Zhang L (1996). “Some notes on the nearest neighbour interchange distance.” In Goos G, Hartmanis J, Leeuwen J, Cai J, Wong CK (eds.), Computing and Combinatorics, volume 1090, 343--351. Springer, Berlin, Heidelberg. ISBN 978-3-540-61332-9 978-3-540-68461-9, doi: 10.1007/3-540-61332-3_168 .
Kendall M, Colijn C (2016). “Mapping phylogenetic trees to reveal distinct patterns of evolution.” Molecular Biology and Evolution, 33(10), 2735--2743. doi: 10.1093/molbev/msw124 .
Nye TMW, Liò P, Gilks WR (2006). “A novel algorithm and web-based tool for comparing two alternative phylogenetic trees.” Bioinformatics, 22(1), 117--119. doi: 10.1093/bioinformatics/bti720 .
Robinson DF, Foulds LR (1981). “Comparison of phylogenetic trees.” Mathematical Biosciences, 53(1-2), 131--147. doi: 10.1016/0025-5564(81)90043-2 .
Smith MR (2019). “Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets.” Biology Letters, 15, 20180632. doi: 10.1098/rsbl.2018.0632 .
Smith MR (2020). “Information theoretic Generalized Robinson-Foulds metrics for comparing phylogenetic trees.” Bioinformatics, online ahead of print. doi: 10.1093/bioinformatics/btaa614 .
Steel MA, Penny D (1993). “Distributions of tree comparison metrics---some new results.” Systematic Biology, 42(2), 126--141. doi: 10.1093/sysbio/42.2.126 .
Valiente G (2009). Combinatorial Pattern Matching Algorithms in Computational Biology using Perl and R, CRC Mathematical and Computing Biology Series. CRC Press, Boca Raton.
Tree pairs between which distances were calculated are available
in data objects randomTreePairs25
and randomTreePairs50
.