topiary.opentree

Interface to open tree of life database.

topiary.opentree.ott

Get OTT ids given a topiary dataframe.

topiary.opentree.ott.get_df_ott(df, verbose=True, keep_anyway=False)

Return a copy of df with an ott column holding open tree of life names for each species. It also adds a “resolvable” column indicating whether the species can be resolved on the open tree of life synthetic tree.

Parameters:
  • df (pandas.DataFrame) – dataframe that has an ott column with Open Tree of Life taxon ids

  • verbose (bool, default=True) – whether or not to print out unresolvable taxa, etc.

  • keep_anyway (bool, default=False) – Do not set keep = False for species that cannot be found or resolved on the OTT.

Returns:

topiary_df – Copy of df with added ott, orig_species, and resolvable columns. ott column holds ott index for the species. orig_species holds what used to be in the species column. The species column is replaced by clean species name used by Open Tree of Life. The resolvable column indicates whether the species can be resolved on the synthetic tree. Rows with species that have no ott and/or are not resolvable have keep set to False. This will not respect the always_keep column, and will warn the user it is doing so. If keep_anyway is set to True, this function will populate the columns as described above, but will not set keep to False for bad values.

Return type:

pandas.DataFrame

topiary.opentree.tree

Get a species tree given a topiary dataframe.

topiary.opentree.tree.df_to_species_tree(df, strict=False)

Return an ete3 cladogram of species in tree. The leaves on the tree will have the following features:

  • leaf.name: ott as string

  • leaf.ott: ott as string

  • leaf.species: bionomial species name as string

  • leaf.uid: list of all uid that have this species

Parameters:
  • df (pandas.DataFrame) – topiary dataframe that has an ott column with Open Tree of Life taxon ids

  • strict (bool, default=False) – if strict, throw ValueError if a species cannot be found on opentree

Returns:

  • species_tree (ete3.Tree) – An ete3 tree with branch lengths of 1, supports of 1, and only tip labels. Note: any polytomies are arbirarily resolved.

  • dropped (list) – list of ott corresponding to dropped sequences

topiary.opentree.util

Functions to interact directly with opentree database

topiary.opentree.util.ott_to_mrca(ott_list=None, species_list=None, move_up_by=0, avoid_all_life=True, microbial_to_domain=True)

Get the most recent common ancestor given a list of ott. Unrecognized ott are dropped with a warning.

Parameters:
  • ott_list (list, optional) – list of ott ids (integers). this or species_list must be specified

  • species_list (list, optional) – list of binomial species (str). this or ott_list must be specified

  • move_up_by (int, default=0) – starting at actual MRCA, move up by this number of ranks. For example, if the MRCA for a set of OTT was a kingdom and move_up_by = 1, this would yield the relevant domain.

  • avoid_all_life (bool, default=True) – if possible, avoid the jump to all cellular organisms. This takes precedence over move_up_by.

  • microbial_to_domain (bool, default=True) – if all ott are from Bacteria or all ott are from Archaea, return the domain as the mrca.

Returns:

out – dictionary with keys ott_name, ott_id, ott_rank, lineage, taxid, and is_microbial

Return type:

dict

topiary.opentree.util.ott_to_resolvable(ott_list=None, species_list=None)

Get whether or not taxa are resolvable on the synthetic ott tree.

Parameters:
  • ott_list (list, optional) – list of ott ids (integers). this or species_list must be specified

  • species_list (list, optional) – list of binomial species (str). this or ott_list must be specified

Returns:

resolvable – list of True/False for each ott in ott_list

Return type:

list

topiary.opentree.util.ott_to_species_tree(ott_list=None, species_list=None)

Get a species tree from a list of ott.

Parameters:
  • ott_list (list, optional) – list of ott ids (integers). this or species_list must be specified

  • species_list (list, optional) – list of binomial species (str). this or ott_list must be specified

Returns:

  • species_tree (ete3.Tree or None) – species tree. None if tree cannot be pulled down.

  • results (dict) – dictionary with resolved, missing, not_resolved, and not_monophyletic ott.

topiary.opentree.util.sort_df_by_taxa(df, paralog_column=None, ref_ott=None, only_keepers=False)

Sort a dataframe according to paralog call and then species phylogeny. If a dataframe has two paralogs A and B, this will return the dataframe with all A first and all B second. Within each paralog, the proteins are sorted by distance from the species given by ref_ott. The ref_ott sets the first species in the list; all other species are sorted relative to their distance from that species.

Parameters:
  • df (pandas.DataFrame) – topiary dataframe to sort

  • paralog_column (str, optional) – column holding paralogs. If not specified, tries recip_paralog first, then nickname, then settles on name.

  • ref_ott (str, optional) – ott (in ottINTEGER format). If not specified, first tries to get the ott for the first key species in the dataframe. If there is no key species, it chooses the key species arbitrarily.

  • only_keepers (bool, default=False) – only include proteins with keep=True.

topiary.opentree.util.species_to_ott(species)

Return ott ids (and other information) given a list of species.

Parameters:

species (list) – list of species in binomial format

Returns:

  • ott_list (list) – list of ott as integer for all species in the order they were passed. If species not found, set to None.

  • species_list (list) – list of species found for each species. If not found, return the input species name.

  • results (dict) – dictionary of information about species keyed to input species name

Notes

For all species, whether matched or not, the results dictionary will have following fields keyed to input species name.

  • matched: whether or not this gave an unambiguous match

  • num_matches: number of matched hits

  • msg: message describing information about match

  • ret: match tuple from opentree.OT.tnrs_match

  • ott_id: integer ott id

  • ott_name: found name corresponding to ott_id

  • taxid: NCBI taxid or None if no NCBI taxid associated with record

  • resolved: bool. whether or not species is resolved on synthetic tree

The opentree tnrs_match api is smart enough to infer context from the species you pass in. If you pass in an ambiguous species name, it will select the species based on the other species in the list. If the species cannot be inferred from the other species context, this function will return no ott for the ambiguous species.

topiary.opentree.util.tree_to_taxa_order(T, ref_name=None)

Get taxa in a stereotypical order given a tree.

Parameters:
  • T (ete3.Tree) – ete3.Tree with leaves that have meaningful .name element. (This function does not check, but it really only makes sense if these all have unique values).

  • ref_name (str, optional) – use the leaf with this name as the first element in the output list. Other leaf names will come out sorted relative to their distance from this leaf name. If not specified (or no leaf in tree matches), choose the first node in the tree as the reference.

Returns:

node_names – list of node names sorted by a flattened taxonomic grouping relative to ref_name.

Return type:

list