GET_PHYLOMARKERS MANUAL

Brief presentation and graphical overview of the pipeline

This manual provides the usage details for GET_PHYLOMARKERS, a software package designed to select “well-behaved” phylogenetic markers to estimate a maximum likelihoood (ML) species tree from the supermatrix of concatenated, top-scoring alignments. These are identified through a series of sequential filters that operate on orthologous gene/transcript/protein clusters computed by GET_HOMOLOGUES to exclude:

  1. alignments with evidence for recombinant sequences
  2. sequences that yield “outlier gene trees” in the context of the distributions of topologies and tree-lengths expected under the multispecies coalescent
  3. poorly resolved gene trees

Figure 1 provides a graphical overview of the GET_PHYLOMARKERS pipeline. The Manual will describe in detail each of these steps along with the options available to the user to control the pipeline’s behaviour, the stringency of the filters, as well as the number of substitution models evaluated and tree-search thoroughness. In addition, the script estimate_pangenome_phylogenies.sh can search for ML and parsimony pan-genome phylogenies using the pan-genome matrix computed by compare_clusters.pl from the GET_HOMOLOGUES suite, as shown in the pipeline’s flowchart below.