topiary.raxml

Interface to raxml-ng.

topiary.raxml.ancestors

Generate ancestors and various summary outputs.

topiary.raxml.ancestors.generate_ancestors(prev_calculation=None, df=None, model=None, gene_tree=None, reconciled_tree=None, alt_cutoff=0.25, calc_dir='ancestors', overwrite=False, num_threads=-1, raxml_binary='raxml-ng')

Generate ancestors and various summary outputs. Creates fasta file and csv file with ancestral sequences, set of ancestor plots, and a tree with ancestral names and posterior probabilities. Note: this will always reconstruct on the reconciled tree if present (either passed in via prev_calculation or via the reconciled_tree argument).

Parameters:

prev_calculation (str or Supervisor, optional) – previously completed calculation. Should either be a directory containing the calculation (e.g. the directory with run_parameters.json, input, working, output) or a Supervisor instance with a calculation loaded. Function will load dataframe, model, gene_tree, and reconciled_tree from the previous run. If this is not specified, df, model, and gene_tree or reconciled_tree arguments must be specified.
df (pandas.DataFrame or str, optional) – topiary data frame or csv written out from topiary df. Will override dataframe from prev_calculation if specified.
model (str, optional) – model (i.e. “LG+G8”). Will override model from prev_calculation if specified.
gene_tree (str or ete3.Tree or dendropy.Tree) – gene_tree. Reconstruct ancestors on this tree. Will override gene_tree from prev_calculation if specified. Should be newick with only leaf names and branch lengths. If this an ete3 or dendropy tree, it will be written out with leaf names and branch lengths; all other data will be dropped. NOTE: if reconciled_tree is specified OR is present in the prev_calculation, the reconciled tree will take precedence over this gene tree.
reconciled_tree (str or ete3.Tree or dendropy.Tree) – reconciled_tree. Reconstruct ancestors on this tree. Will override reconciled_tree from prev_calculation if specified. Should be newick with only leaf names and branch lengths. If this an ete3 or dendropy tree, it will be written out with leaf names and branch lengths; all other data will be dropped
alt_cutoff (float, default=0.25) – cutoff to use for altAll calculation. Should be between 0 and 1.
calc_dir (str, default="ancestors") – calculation directory.
overwrite (bool, default=False) – whether or not to overwrite existing calc_dir
num_threads (int, default=-1) – number of threads to use. if -1, use all available
raxml_binary (str, optional) – what raxml binary to use

Returns:

plot – if running in jupyter notebook, return toyplot.canvas; otherwise, return None.

Return type:

toyplot.canvas or None

topiary.raxml.bootstrap

Generate bootstrap replicates for an existing tree and then calculate bootstrap supports.

topiary.raxml.bootstrap.generate_bootstraps(prev_calculation=None, df=None, model=None, gene_tree=None, calc_dir='ml_bootstrap', overwrite=False, num_bootstraps=None, num_threads=-1, raxml_binary='raxml-ng')

Generate bootstrap replicates for an existing tree and then calculate bootstrap supports.

Parameters:

prev_calculation (str or Supervisor, optional) – previously completed calculation. Should either be a directory containing the calculation (e.g. the directory with run_parameters.json, input, working, output) or a Supervisor instance with a calculation loaded. Function will load dataframe, model and gene_tree from the previous run. If this is not specified, df, model, and gene_tree arguments must be specified.
df (pandas.DataFrame or str, optional) – topiary data frame or csv written out from topiary df. Will override dataframe from prev_calculation if specified.
model (str, optional) – model (i.e. “LG+G8”). Will override model from prev_calculation if specified.
gene_tree (str or ete3.Tree or dendropy.Tree) – gene_tree. Used as starting point for calculation. Will override tree from prev_calculation if specified. Should be newick with only leaf names and branch lengths. If this an ete3 or dendropy tree, it will be written out with leaf names and branch lengths; all other data will be dropped
calc_dir (str, default="ml_bootstrap") – directory in which to do calculation.
overwrite (bool, default=False) – whether or not to overwrite existing calc_dir
num_bootstraps (int, optional) – how many bootstrap replicates to generate. If None, use autoMRE to automatically infer the number of replicates given the data.
num_threads (int, default=-1) – number of threads to use. if -1, use all available
raxml_binary (str, optional) – what raxml binary to use

Returns:

plot – if running in jupyter notebook, return toyplot.canvas; otherwise, return None.

Return type:

toyplot.canvas or None

topiary.raxml.convergence

Check a newick file containing bootstrap replicates for convergence using a specified convergence cutoff.

topiary.raxml.convergence.check_convergence(bs_newick, converge_cutoff=0.03, seed=True, calc_dir=None, num_threads=1, raxml_binary='raxml-ng')

Check a newick file containing bootstrap replicates for convergence using a specified convergence cutoff.

bs_newickstr: newick file containing bootstrap replicate trees
converge_cutofffloat, default=0.03: convergence cutoff. permutations with WRF below cutoff that stop bootstrapping
seedbool,int,str: If true, pass a randomly generated seed to raxml. If int or str, use that as the seed. (passed via –seed)
calc_dirstr, optional: write output to this directory. if None, create temporary directory, parse output, and then delete temporary directory
num_threadsint, default=1: run calculation on num_threads threads
raxml_binarystr, optional: what raxml binary to use

Returns:

converged (bool) – whether or not the bootstraps are converged
df (pandas.DataFrame) – dataframe holding convergence results

topiary.raxml.model

Find the best phylogentic model to use for tree and ancestor reconstruction given an alignment and (possibly) a tree.

topiary.raxml.model.find_best_model(df, gene_tree=None, model_matrices=['cpREV', 'Dayhoff', 'DCMut', 'DEN', 'Blosum62', 'FLU', 'HIVb', 'HIVw', 'JTT', 'JTT-DCMut', 'LG', 'mtART', 'mtMAM', 'mtREV', 'mtZOA', 'PMB', 'rtREV', 'stmtREV', 'VT', 'WAG'], model_rates=['', 'G8'], model_freqs=['', 'FC', 'FO'], model_invariant=['', 'IC', 'IO'], seed=None, calc_dir='find_best_model', overwrite=False, supervisor=None, num_threads=-1, raxml_binary='raxml-ng')

Find the best phylogentic model to use for tree and ancestor reconstruction given an alignment and (possibly) a tree.

Parameters:

df (pandas.DataFrame or str) – topiary data frame or csv written out from topiary df.
gene_tree (str) – gene_tree in newick format. If not specified, parsimony tree is generated and used
model_matrices (list, default=["cpREV","Dayhoff","DCMut","DEN","Blosum62","FLU","HIVb","HIVw","JTT","JTT-DCMut","LG","mtART","mtMAM","mtREV","mtZOA","PMB","rtREV","stmtREV","VT","WAG"]) – list of model matrices to check
model_rates (list, default=["","G8"]) – ways to treat model rates. If None, do not include a model rate param.
model_freqs (list, default=["","FC","FO"]) – ways to treat model freqs. If None, use the matrix frequencies.
model_invariant (list, default=["","IC","IO"]) – ways to treat invariant alignment columns. If None, do not have an invariant class.
seed (bool,int,str) – If true, pass a randomly generated seed to raxml. If int or str, use that as the seed. (passed via –seed)
calc_dir (str, default="find_best_model") – calculation directory.
overwrite (bool, default=False) – whether or not to overwrite existing output
supervisor (Supervisor, optional) – instance of Supervisor for managing calculation inputs and outputs
num_threads (int, default=-1) – number of threads to use. if -1 use all available
raxml_binary (str, optional) – raxml binary to use

Returns:

Return type:

None

topiary.raxml.tree

Generate maximum likelihood tree from an alignment given an evolutionary model.

topiary.raxml.tree.generate_ml_tree(prev_calculation=None, df=None, model=None, gene_tree=None, calc_dir='ml_tree', overwrite=False, bootstrap=False, num_threads=-1, raxml_binary='raxml-ng')

Generate maximum likelihood tree from an alignment given an evolutionary model.

Parameters:

prev_calculation (str or Supervisor, optional) – previously completed calculation. Should either be a directory containing the calculation (e.g. the directory with run_parameters.json, input, working, output) or a Supervisor instance with a calculation loaded. Function will load dataframe and model from the previous run. If this is not specified, df and model arguments must be specified.
df (pandas.DataFrame or str, optional) – topiary data frame or csv written out from topiary df. Will override dataframe from prev_calculation if specified.
model (str, optional) – model (i.e. “LG+G8”). Will override model from prev_calculation if specified.
gene_tree (str or ete3.Tree or dendropy.Tree) – gene_tree. Used as starting point for calculation. Will override tree from prev_calculation if specified. Should be newick with only leaf names and branch lengths. If this an ete3 or dendropy tree, it will be written out with leaf names and branch lengths; all other data will be dropped
calc_dir (str, default="ml_tree") – calculation directory. Will be created.
overwrite (bool, default=False) – whether or not to overwrite existing output
bootstrap (bool, default=False) – whether or not to do bootstrap replicates
supervisor (Supervisor, optional) – instance of Supervisor for managing calculation inputs and outputs
num_threads (int, default=-1) – number of threads to use. if -1, use all available
raxml_binary (str, optional) – what raxml binary to use

Returns:

plot – if running in jupyter notebook, return toyplot.canvas; otherwise, return None.

Return type:

toyplot.canvas or None