Consistency / retention

Consistency() calculates the consistency "index" and retention index (Farris 1989) for each character in a dataset, given a bifurcating tree. Although there is not a straightforward interpretation of these indices, they are sometimes taken as an indicator of the fit of a character to a tree. Values correlate with the number of species sampled and the distribution of taxa between character states, so are not strictly comparable between characters in which these factors differ.

Usage

Consistency(dataset, tree, compress = FALSE)

Arguments

dataset: A phylogenetic data matrix of phangorn class phyDat, whose names correspond to the labels of any accompanying tree. Perhaps load into R using ReadAsPhyDat. Additive (ordered) characters can be handled using Decompose.
tree: A tree of class phylo.
compress: Logical specifying whether to retain the compression of a phyDat object or to return a vector specifying to each individual character, decompressed using the dataset's index attribute.

Value

Consistency() returns a matrix with named columns specifying the consistency index (ci), retention index (ri), and rescaled consistency index (rc).

Details

The consistency "index" (Kluge and Farris 1969) is defined as the number of steps observed in the most parsimonious mapping of a character to a tree, divided by the number of steps observed on the shortest possible tree for that character. A value of one indicates that a character's fit to the tree is optimal. Note that as the possible values of the consistency index do not range from zero to one, it is not an index in the mathematical sense of the term.

The maximum length of a character (see MaximumLength()) is the number of steps in a parsimonious reconstruction on the longest possible tree for a character. The retention index is the maximum length of a character minus the number of steps observed on a given tree; divided by the maximum length minus the minimum length. It is interpreted as the ratio between the observed homoplasy, and the maximum observed homoplasy, and scales from zero (worst fit that can be reconstructed under parsimony) to one (perfect fit).

The rescaled consistency index is the product of the consistency and retention indices; it rescales the consistency index such that its range of possible values runs from zero (least consistent) to one (perfectly consistent).

The lengths of characters including inapplicable tokens are calculated following Brazeau et al. (2019) , matching their default treatment in TreeLength().

References

Brazeau MD, Guillerme T, Smith MR (2019). “An algorithm for morphological phylogenetic analysis with inapplicable data.” Systematic Biology, 68(4), 619–631. doi:10.1093/sysbio/syy083 .

Farris JS (1989). “The Retention Index and the Rescaled Consistency Index.” Cladistics, 5(4), 417–419. doi:10.1111/j.1096-0031.1989.tb00573.x .

Kluge AG, Farris JS (1969). “Quantitative Phyletics and the Evolution of Anurans.” Systematic Zoology, 18(1), 1–32. doi:10.1093/sysbio/18.1.1 .

Author

Martin R. Smith (martin.smith@durham.ac.uk)

Examples

data(inapplicable.datasets)
dataset <- inapplicable.phyData[[4]]
head(Consistency(dataset, TreeTools::NJTree(dataset)))
#>             ci        ri        rc
#> [1,] 0.2500000 0.6250000 0.1562500
#> [2,] 0.3333333 0.3333333 0.1111111
#> [3,] 0.3333333 0.6000000 0.2000000
#> [4,] 0.2500000 0.2500000 0.0625000
#> [5,] 0.5000000 0.8333333 0.4166667
#> [6,] 0.2500000 0.2500000 0.0625000