Hierarchical Mutual Information
Source:R/hierarchical_mutual_information.R
HierarchicalMutualInfo.Rd
Calculate the Hierarchical Mutual Information (HMI) between two trees, following the recursive algorithm of Perotti et al. (2020) .
This function was written during a code sprint: its documentation and test cases have not yet been carefully scrutinized, and its implementation may change without notice. Please alert the maintainer to any issues you encounter.
Usage
HierarchicalMutualInfo(tree1, tree2 = NULL, normalize = FALSE)
HMI(tree1, tree2 = NULL, normalize = FALSE)
SelfHMI(tree)
EHMI(tree1, tree2, precision = 0.01, minResample = 36)
AHMI(tree1, tree2, Mean = max, precision = 0.01, minResample = 36)
Arguments
- normalize
If
FALSE
, return the raw HMI, in bits. IfTRUE
, normalize to range [0,1] by dividing bymax(SelfHMI(tree1), SelfHMI(tree2))
. If a function, divide bynormalize(SelfHMI(tree1), SelfHMI(tree2))
.- tree, tree1, tree2
An object that can be coerced to an
HPart
object.- precision
Numeric; Monte Carlo sampling will terminate once the relative standard error falls below this value.
- minResample
Integer specifying minimum number of Monte Carlo samples to conduct. Avoids early termination when sample size is too small to reliably estimate the standard error of the mean.
- Mean
Function by which to combine the self-information of the two input hierarchies, in order to normalize the HMI.
Value
HierarchicalMutualInfo()
returns a numeric value representing the
hierarchical mutual information between the input trees, in bits,
normalized as specified.
Higher values indicate more shared hierarchical structure.
SelfHMI()
returns the hierarchical mutual information of a tree
compared with itself, i.e. its hierarchical entropy (HH).
EHMI()
returns the expected HMI against a uniform
shuffling of element labels, estimated by performing Monte Carlo resampling
on the same hierarchical structure until the relative standard error of the
estimate falls below precision
.
The attributes of the returned object list the variance (var
),
standard deviation (sd
), standard error of the mean (sem
) and
relative error (relativeError
) of the estimate, and the number of Monte
Carlo samples used to obtain it (samples
).
AHMI()
returns the adjusted HMI, normalized such that
zero corresponds to the expected HMI given a random shuffling
of elements on the same hierarchical structure. The attribute sem
gives
the standard error of the estimate.
Details
HierarchicalMutualInfo()
computes the hierarchical mutual content of trees
(Perotti et al. 2015; Perotti et al. 2020)
, which accounts for the
non-independence of information represented by nested splits.
tree
is converted to a set of hierarchical partitions, and the mutual
information (in bits) is computed recursively; the contribution of a node is
given by:
$$I(t,s) = \log_2(n_{ts}) - \dfrac{H_{us} + H_{tv} - H_{uv}}{n_{ts}} + \text{mean}(I_{uv})$$
Where:
\(n_{ts}\) is the number of common elements between partitions
\(H_{us}, H_{tv}, H_{uv}\) are entropy terms from child comparisons
\(I_{uv}\) is the recursive HMI for child pairs
AHMI()
calculates the adjusted hierarchical mutual information:
$$\text{AHMI}(t, s) = \dfrac{I(t, s) - \hat{I}(t, s)}{
\text{mean}(H(t), H(s)) - \hat{I}(t, s)}$$
Where:
\(I(t, s)\) is the hierarchical mutual information between
tree1
andtree2
\(\hat{I}(t, s)\) is the expected HMI between
tree1
andtree2
, estimated by Monte Carlo sampling\(H(t), H(s)\) is the entropy (self-mutual information) of each tree
References
Perotti JI, Almeira N, Saracco F (2020).
“Towards a Generalization of Information Theory for Hierarchical Partitions.”
Physical Review E, 101(6), 062148.
doi:10.1103/PhysRevE.101.062148
.
Perotti JI, Tessone CJ, Caldarelli G (2015).
“Hierarchical Mutual Information for the Comparison of Hierarchical Community Structures in Complex Networks.”
Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 92(6), 062825-1–062825-13.
doi:10.1103/PhysRevE.92.062825
.
See also
Other tree distances:
JaccardRobinsonFoulds()
,
KendallColijn()
,
MASTSize()
,
MatchingSplitDistance()
,
NNIDist()
,
NyeSimilarity()
,
PathDist()
,
Robinson-Foulds
,
SPRDist()
,
TreeDistance()
Examples
library("TreeTools", quietly = TRUE)
tree1 <- BalancedTree(8)
tree2 <- PectinateTree(8)
# Calculate HMI between two trees
HierarchicalMutualInfo(tree1, tree2)
#> [1] 0.4822863
# HMI normalized against the mean information content of tree1 and tree2
HierarchicalMutualInfo(tree1, tree2, normalize = mean)
#> [1] 0.2411432
# Normalized HMI above is equivalent to:
HMI(tree1, tree2) / mean(SelfHMI(tree1), SelfHMI(tree2))
#> [1] 0.2411432
# Expected mutual info for this pair of hierarchies
EHMI(tree1, tree2, precision = 0.1)
#> [1] 0.3027541
#> attr(,"var")
#> [1] 0.002918266
#> attr(,"sd")
#> [1] 0.05402098
#> attr(,"sem")
#> [1] 0.009003497
#> attr(,"samples")
#> [1] 36
#> attr(,"relativeError")
#> [1] 0.04290379
# The adjusted HMI normalizes against this expectation
AHMI(tree1, tree2, precision = 0.1)
#> [1] 0.07400499
#> attr(,"sem")
#> [1] 0.004966383