phylo2vec.stats.robinson_foulds

phylo2vec.stats.robinson_foulds#

phylo2vec.stats.robinson_foulds(tree1: ndarray, tree2: ndarray, normalize: bool = False) float[source]#

Compute the Robinson-Foulds distance between two trees.

RF distance counts the number of bipartitions (splits) that differ between two tree topologies. Lower values indicate more similar trees.

Parameters:
  • tree1 (np.ndarray) – First tree as Phylo2Vec vector (1D) or matrix (2D). Only topology is used; branch lengths are ignored.

  • tree2 (np.ndarray) – Second tree as Phylo2Vec vector (1D) or matrix (2D). Only topology is used; branch lengths are ignored.

  • normalize (bool, default=False) – If True, return normalized distance in range [0.0, 1.0].

Returns:

RF distance. Integer value if normalize=False, float in [0,1] otherwise.

Return type:

float

Raises:

AssertionError – If trees have different numbers of leaves.

Examples

>>> import numpy as np
>>> from phylo2vec.stats import robinson_foulds
>>> v1 = np.array([0, 1, 2, 3], dtype=np.int16)
>>> v2 = np.array([0, 0, 1, 2], dtype=np.int16)
>>> robinson_foulds(v1, v1)  # Identical trees
0.0
>>> robinson_foulds(v1, v2)  # Different trees
2.0

See also

ete3.Tree.robinson_foulds

Reference implementation in ete3

ape

:dist.topo : Reference implementation in R’s ape package