topiary.raxml

Interface to raxml-ng.

topiary.raxml.ancestors

Generate ancestors and various summary outputs.

topiary.raxml.ancestors.generate_ancestors(prev_calculation=None, df=None, model=None, gene_tree=None, reconciled_tree=None, alt_cutoff=0.25, calc_dir='ancestors', overwrite=False, num_threads=-1, raxml_binary='raxml-ng')

Generate ancestors and various summary outputs. Creates fasta file and csv file with ancestral sequences, set of ancestor plots, and a tree with ancestral names and posterior probabilities. Note: this will always reconstruct on the reconciled tree if present (either passed in via prev_calculation or via the reconciled_tree argument).

Parameters:
  • prev_calculation (str or Supervisor, optional) – previously completed calculation. Should either be a directory containing the calculation (e.g. the directory with run_parameters.json, input, working, output) or a Supervisor instance with a calculation loaded. Function will load dataframe, model, gene_tree, and reconciled_tree from the previous run. If this is not specified, df, model, and gene_tree or reconciled_tree arguments must be specified.

  • df (pandas.DataFrame or str, optional) – topiary data frame or csv written out from topiary df. Will override dataframe from prev_calculation if specified.

  • model (str, optional) – model (i.e. “LG+G8”). Will override model from prev_calculation if specified.

  • gene_tree (str or ete3.Tree or dendropy.Tree) – gene_tree. Reconstruct ancestors on this tree. Will override gene_tree from prev_calculation if specified. Should be newick with only leaf names and branch lengths. If this an ete3 or dendropy tree, it will be written out with leaf names and branch lengths; all other data will be dropped. NOTE: if reconciled_tree is specified OR is present in the prev_calculation, the reconciled tree will take precedence over this gene tree.

  • reconciled_tree (str or ete3.Tree or dendropy.Tree) – reconciled_tree. Reconstruct ancestors on this tree. Will override reconciled_tree from prev_calculation if specified. Should be newick with only leaf names and branch lengths. If this an ete3 or dendropy tree, it will be written out with leaf names and branch lengths; all other data will be dropped

  • alt_cutoff (float, default=0.25) – cutoff to use for altAll calculation. Should be between 0 and 1.

  • calc_dir (str, default="ancestors") – calculation directory.

  • overwrite (bool, default=False) – whether or not to overwrite existing calc_dir

  • num_threads (int, default=-1) – number of threads to use. if -1, use all available

  • raxml_binary (str, optional) – what raxml binary to use

Returns:

plot – if running in jupyter notebook, return toyplot.canvas; otherwise, return None.

Return type:

toyplot.canvas or None

topiary.raxml.bootstrap

Generate bootstrap replicates for an existing tree and then calculate bootstrap supports.

topiary.raxml.bootstrap.generate_bootstraps(prev_calculation=None, df=None, model=None, gene_tree=None, calc_dir='ml_bootstrap', overwrite=False, num_bootstraps=None, num_threads=-1, raxml_binary='raxml-ng')

Generate bootstrap replicates for an existing tree and then calculate bootstrap supports.

Parameters:
  • prev_calculation (str or Supervisor, optional) – previously completed calculation. Should either be a directory containing the calculation (e.g. the directory with run_parameters.json, input, working, output) or a Supervisor instance with a calculation loaded. Function will load dataframe, model and gene_tree from the previous run. If this is not specified, df, model, and gene_tree arguments must be specified.

  • df (pandas.DataFrame or str, optional) – topiary data frame or csv written out from topiary df. Will override dataframe from prev_calculation if specified.

  • model (str, optional) – model (i.e. “LG+G8”). Will override model from prev_calculation if specified.

  • gene_tree (str or ete3.Tree or dendropy.Tree) – gene_tree. Used as starting point for calculation. Will override tree from prev_calculation if specified. Should be newick with only leaf names and branch lengths. If this an ete3 or dendropy tree, it will be written out with leaf names and branch lengths; all other data will be dropped

  • calc_dir (str, default="ml_bootstrap") – directory in which to do calculation.

  • overwrite (bool, default=False) – whether or not to overwrite existing calc_dir

  • num_bootstraps (int, optional) – how many bootstrap replicates to generate. If None, use autoMRE to automatically infer the number of replicates given the data.

  • num_threads (int, default=-1) – number of threads to use. if -1, use all available

  • raxml_binary (str, optional) – what raxml binary to use

Returns:

plot – if running in jupyter notebook, return toyplot.canvas; otherwise, return None.

Return type:

toyplot.canvas or None

topiary.raxml.convergence

Check a newick file containing bootstrap replicates for convergence using a specified convergence cutoff.

topiary.raxml.convergence.check_convergence(bs_newick, converge_cutoff=0.03, seed=True, calc_dir=None, num_threads=1, raxml_binary='raxml-ng')

Check a newick file containing bootstrap replicates for convergence using a specified convergence cutoff.

bs_newickstr

newick file containing bootstrap replicate trees

converge_cutofffloat, default=0.03

convergence cutoff. permutations with WRF below cutoff that stop bootstrapping

seedbool,int,str

If true, pass a randomly generated seed to raxml. If int or str, use that as the seed. (passed via –seed)

calc_dirstr, optional

write output to this directory. if None, create temporary directory, parse output, and then delete temporary directory

num_threadsint, default=1

run calculation on num_threads threads

raxml_binarystr, optional

what raxml binary to use

Returns:

  • converged (bool) – whether or not the bootstraps are converged

  • df (pandas.DataFrame) – dataframe holding convergence results

topiary.raxml.model

Find the best phylogentic model to use for tree and ancestor reconstruction given an alignment and (possibly) a tree.

topiary.raxml.model.find_best_model(df, gene_tree=None, model_matrices=['cpREV', 'Dayhoff', 'DCMut', 'DEN', 'Blosum62', 'FLU', 'HIVb', 'HIVw', 'JTT', 'JTT-DCMut', 'LG', 'mtART', 'mtMAM', 'mtREV', 'mtZOA', 'PMB', 'rtREV', 'stmtREV', 'VT', 'WAG'], model_rates=['', 'G8'], model_freqs=['', 'FC', 'FO'], model_invariant=['', 'IC', 'IO'], seed=None, calc_dir='find_best_model', overwrite=False, supervisor=None, num_threads=-1, raxml_binary='raxml-ng')

Find the best phylogentic model to use for tree and ancestor reconstruction given an alignment and (possibly) a tree.

Parameters:
  • df (pandas.DataFrame or str) – topiary data frame or csv written out from topiary df.

  • gene_tree (str) – gene_tree in newick format. If not specified, parsimony tree is generated and used

  • model_matrices (list, default=["cpREV","Dayhoff","DCMut","DEN","Blosum62","FLU","HIVb","HIVw","JTT","JTT-DCMut","LG","mtART","mtMAM","mtREV","mtZOA","PMB","rtREV","stmtREV","VT","WAG"]) – list of model matrices to check

  • model_rates (list, default=["","G8"]) – ways to treat model rates. If None, do not include a model rate param.

  • model_freqs (list, default=["","FC","FO"]) – ways to treat model freqs. If None, use the matrix frequencies.

  • model_invariant (list, default=["","IC","IO"]) – ways to treat invariant alignment columns. If None, do not have an invariant class.

  • seed (bool,int,str) – If true, pass a randomly generated seed to raxml. If int or str, use that as the seed. (passed via –seed)

  • calc_dir (str, default="find_best_model") – calculation directory.

  • overwrite (bool, default=False) – whether or not to overwrite existing output

  • supervisor (Supervisor, optional) – instance of Supervisor for managing calculation inputs and outputs

  • num_threads (int, default=-1) – number of threads to use. if -1 use all available

  • raxml_binary (str, optional) – raxml binary to use

Returns:

Return type:

None

topiary.raxml.tree

Generate maximum likelihood tree from an alignment given an evolutionary model.

topiary.raxml.tree.generate_ml_tree(prev_calculation=None, df=None, model=None, gene_tree=None, calc_dir='ml_tree', overwrite=False, bootstrap=False, num_threads=-1, raxml_binary='raxml-ng')

Generate maximum likelihood tree from an alignment given an evolutionary model.

Parameters:
  • prev_calculation (str or Supervisor, optional) – previously completed calculation. Should either be a directory containing the calculation (e.g. the directory with run_parameters.json, input, working, output) or a Supervisor instance with a calculation loaded. Function will load dataframe and model from the previous run. If this is not specified, df and model arguments must be specified.

  • df (pandas.DataFrame or str, optional) – topiary data frame or csv written out from topiary df. Will override dataframe from prev_calculation if specified.

  • model (str, optional) – model (i.e. “LG+G8”). Will override model from prev_calculation if specified.

  • gene_tree (str or ete3.Tree or dendropy.Tree) – gene_tree. Used as starting point for calculation. Will override tree from prev_calculation if specified. Should be newick with only leaf names and branch lengths. If this an ete3 or dendropy tree, it will be written out with leaf names and branch lengths; all other data will be dropped

  • calc_dir (str, default="ml_tree") – calculation directory. Will be created.

  • overwrite (bool, default=False) – whether or not to overwrite existing output

  • bootstrap (bool, default=False) – whether or not to do bootstrap replicates

  • supervisor (Supervisor, optional) – instance of Supervisor for managing calculation inputs and outputs

  • num_threads (int, default=-1) – number of threads to use. if -1, use all available

  • raxml_binary (str, optional) – what raxml binary to use

Returns:

plot – if running in jupyter notebook, return toyplot.canvas; otherwise, return None.

Return type:

toyplot.canvas or None