Calculate the Kendall–Colijn tree distance, a measure related to the path difference.
KendallColijn(tree1, tree2 = NULL, Vector = KCVector) KCVector(tree) PathVector(tree) SplitVector(tree) KCDiameter(tree)
Trees of class
phylo, with leaves labelled identically,
or lists of such trees to undergo pairwise comparison. Where implemented,
tree2 = NULL will compute distances between each pair of trees in the list
tree1 using a fast algorithm based on Day (1985).
Function converting a tree to a numeric vector.
KCVector, the default, returns the number of edges between the common
ancestor of each pair of leaves and the root of the tree (per
Kendall & Colijn 2016).
PathVector returns the number of edges between each pair of leaves (per
Steel & Penny 1993).
SplitVector returns the size of the smallest split that contains each
pair of leaves (per Smith, forthcoming).
A tree of class
KendallColijn() returns an array of numerics providing the
distances between each pair of trees in
KCDiameter() returns the value of the Kendall & Colijn's (2016)
metric distance between two pectinate trees with n leaves ordered in
the opposite direction, which I suggest (without any attempt at a proof) may
be a useful proxy for the diameter (i.e. maximum value) of the K–C
The Kendall–Colijn distance works by measuring, for each pair of leaves, the distance from the most recent common ancestor of those leaves and the root node. For a given tree, this produces a vector of values recording the distance-from-the-root of each most recent common ancestor of each pair of leaves.
Two trees are compared by taking the Euclidian distance between the respective vectors. This is calculated by taking the square root of the sum of the squares of the differences between the vectors.
This metric emphasizes the position of the root; the path difference instead measures the distance of the last common ancestor of each pair of leaves from the leaves themselves, i.e. the length of the path from one leaf to another.
KCVector: Creates a vector that characterises a rooted tree,
as described in Kendall & Colijn (2016).
PathVector: Creates a vector reporting the path length between
each pair of leaves, per the path metric of Steel & Penny (1993).
SplitVector: Creates a vector reporting the smallest split
containing each pair of leaves, per the metric proposed in Smith
Kendall M, Colijn C (2016). “Mapping phylogenetic trees to reveal distinct patterns of evolution.” Molecular Biology and Evolution, 33(10), 2735--2743. doi: 10.1093/molbev/msw124 .
Smith MR (2022). “Robust analysis of phylogenetic tree space.” Systematic Biology, syab100. doi: 10.1093/sysbio/syab100 .
is a more sophisticated, if more cumbersome, implementation that supports
lambda > 0, i.e. use of edge lengths in tree comparison.
KendallColijn(TreeTools::BalancedTree(8), TreeTools::PectinateTree(8)) #>  11.48913 set.seed(0) KendallColijn(TreeTools::BalancedTree(8), lapply(rep(8, 3), ape::rtree)) #>  9.591663 5.567764 9.949874 KendallColijn(lapply(rep(8, 4), ape::rtree)) #> 1 2 3 #> 2 7.280110 #> 3 7.874008 8.185353 #> 4 4.795832 7.071068 7.681146 KendallColijn(lapply(rep(8, 4), ape::rtree), Vector = SplitVector) #> 1 2 3 #> 2 10.862780 #> 3 10.148892 12.124356 #> 4 8.000000 11.489125 8.185353 # Notice that changing tree shape close to the root results in much # larger differences tree1 <- ape::read.tree(text = "(a, (b, (c, (d, (e, (f, (g, h)))))));") tree2 <- ape::read.tree(text = "(a, ((b, c), (d, (e, (f, (g, h))))));") tree3 <- ape::read.tree(text = "(a, (b, (c, (d, (e, ((f, g), h))))));") trees <- c(tree1, tree2, tree3) KendallColijn(trees) #> 1 2 #> 2 4.000000 #> 3 1.414214 4.242641 KendallColijn(trees, Vector = SplitVector) #> 1 2 #> 2 2.449490 #> 3 2.449490 3.162278 KCDiameter(trees) #> Warning: first element used of 'length.out' argument #> Warning: longer object length is not a multiple of shorter object length #>  15.87451 KCDiameter(4) #>  3.162278