Skip to contents

The Hamming distance between a pair of taxa is the number of characters with a different coding, i.e. the smallest number of evolutionary steps that must have occurred since their common ancestor.

Usage

Hamming(
  dataset,
  ratio = TRUE,
  ambig = c("median", "mean", "zero", "one", "na", "nan")
)

Arguments

dataset

Object of class phyDat.

ratio

Logical specifying whether to weight distance against maximum possible, given that a token that is ambiguous in either of two taxa cannot contribute to the total distance between the pair.

ambig

Character specifying value to return when a pair of taxa have a zero maximum distance (perhaps due to a preponderance of ambiguous tokens). "median", the default, take the median of all other distance values; "mean", the mean; "zero" sets to zero; "one" to one; "NA" to NA_integer_; and "NaN" to NaN.

Value

Hamming() returns an object of class dist listing the Hamming distance between each pair of taxa.

Details

Tokens that contain the inapplicable state are treated as requiring no steps to transform into any applicable token.

See also

Used to construct neighbour joining trees in NJTree().

dist.hamming() in the phangorn package provides an alternative implementation.

Examples

tokens <- matrix(c(0, 0, "0", 0, "?",
                   0, 0, "1", 0, 1,
                   0, 0, "1", 0, 1,
                   0, 0, "2", 0, 1,
                   1, 1, "-", "?", 0,
                   1, 1, "2", 1, "{01}"),
                   nrow = 6, ncol = 5, byrow = TRUE,
                   dimnames = list(
                     paste0("Taxon_", LETTERS[1:6]),
                     paste0("Char_", 1:5)))

dataset <- MatrixToPhyDat(tokens)
Hamming(dataset)
#>         Taxon_A Taxon_B Taxon_C Taxon_D Taxon_E
#> Taxon_B    0.25                                
#> Taxon_C    0.25    0.00                        
#> Taxon_D    0.25    0.20    0.20                
#> Taxon_E    1.00    1.00    1.00    1.00        
#> Taxon_F    1.00    0.80    0.80    0.60    0.00