topiary.util
Public utility functions for the topiary package.
topiary.util.create_nicknames
Create a nickname column that has a friendly nickname for each sequence.
- topiary.util.create_nicknames.create_nicknames(df, paralog_patterns, source_column='name', output_column='nickname', separator='/', unassigned_name='unassigned', overwrite_output=False, ignorecase=True)
Create a nickname column that has a friendly nickname for each sequence, generated by looking for patterns defined in the
paralog_patterns
dictionary insource_column
column from the dataframe.- Parameters:
df (pandas.DataFrame) – topiary dataframe
paralog_patterns (dict) –
dictionary for creating standardized nicknames from input names. Key specifies what should be output, values the a list of patterns that map back to that key. For example:
{"S100A9":["S100-A9","S100 A9","S-100 A9","MRP14"], "S100A8":["S100-A8","S100 A8","S-100 A9","MRP8"]}
would assign “S100A9” to any sequence matching patterns only from its list; “S100A8” to any sequence matching patterns only from its list; and S100A9/S100A8 to any sequence matching patterns from both lists.
source_column (str, default="name") – source column in dataframe to use to generate a nickname
output_column (str, default="nickname") – column in which to store newly constructed nicknames
separator (str, default="/") – character to place between nicknames if more than one pattern matches.
unassigned_name (str, default="unassigned") – nickname to give sequences that do not match any of the patterns.
overwrite_output (bool, default=False) – overwrite an existing output column
ignorecase (bool, default=True) – Whether or not to ignore the case of matches when assigning the nickname.
- Returns:
topiary_dataframe – Copy of dataframe with new nickname column
- Return type:
pandas.DataFrame