phylo2vec.utils.vector.queue_shuffle

Contents

phylo2vec.utils.vector.queue_shuffle#

phylo2vec.utils.vector.queue_shuffle(v, shuffle_cherries=False) Tuple[ndarray, List[int]][source]#

Produce an ordered version (i.e., birth-death process version) of a Phylo2Vec vector using the Queue Shuffle algorithm.

Queue Shuffle ensures that the output tree is ordered, while also ensuring a smooth path through the space of orderings

For more details, see https://doi.org/10.1093/gbe/evad213 Illustration of the algorithm:

////-3

////6|

////7| \-2 | |

-8| \-1

////-4
\5|

\-0

The ancestry array of this tree is: [8, 7, 5] [7, 6, 1] [6, 3, 2] [5, 4, 0]

Unrolled, it becomes: 8 7 5 6 1 3 2 4 0

We encode the nodes as it: Start by encoding the first two non-root nodes as 0, 1 For the next pairs:

  • The left member takes the label was the previous parent node

  • The right member increments the previous right member by 1

Ex: 8 7 5 6 1 3 2 4 0

0 1 0 2

then

8 7 5 6 1 3 2 4 0

0 1 0 2 1 3

then

8 7 5 6 1 3 2 4 0

0 1 0 2 1 3 0 4

The code for the leaf nodes (0, 1, 2, 3, 4) is their new label

Note that the full algorithm also features a queue of internal nodes which could switch the processing order of rows in the ancestry array.

Parameters:
  • v (numpy.ndarray) – Phylo2Vec vector

  • shuffle_cherries (bool, optional) – If True, shuffle at random the order of the cherries in the ancestry matrix (i.e., the first two columns of the ancestry matrix).

Returns:

  • v_new (numpy.ndarray) – Reordered Phylo2Vec vector

  • vec_mapping (List[int]) – Mapping of the original vector to the new vector index: leaf value: new leaf index in the reordered vector