synthaser.grouping

This module contains some functions used for grouping Synthase objects by their classifications.

This is used primarily when grouping sequences for the purpose of annotation in the plot (i.e. grouping Synthases of like classification, at each level in the classification hierarchy). Since annotations need to be drawn from more specific to less specific, this module generates groups in reverse.

Given a collection of classified Synthase objects, a basic workflow using this module might be:

  1. Build a dictionary of synthase headers grouped by classification:
>>> levels = group_synthases(synthases)
>>> levels
defaultdict(<class 'list'>, {'PKS': ['seq1', 'seq2', ...], 'HR-PKS': ['seq1', ...]})
  1. Determine the hierarchy of synthase classifications in your synthases.
>>> hierarchy = get_classification_paths(synthases)
>>> hierarchy
{'PKS': {'Type I': {'Non-reducing': {}, 'Highly-reducing': {}, 'Partially-reducing': {}}}, 'Hybrid': {}}

Note, this is agnostic to our rule files - the rule hierarchy here is built solely from what is stored in each Synthase object. This also means there should be no redundant classifications.

  1. Build an array of annotation groups, each in drawing (i.e. reverse) order.
>>> groups = get_annotation_groups(hierarchy)
>>> groups
[
  [
    {'classification': 'Partially-reducing', 'depth': 2},
    {'classification': 'Highly-reducing', 'depth': 2},
    {'classification': 'Non-reducing', 'depth': 2},
    {'classification': 'Type I', 'depth': 1},
    {'classification': 'PKS', 'depth': 0}
  ],
  [{'classification': 'Hybrid', 'depth': 0}],
]

Since annotations are drawn from more to less specific, and each classification is drawn at some offset to the previous one, we need some way of differentiating their level - hence the depth property.

synthaser.grouping.build_dict(path, d=None)

Recursively generates a dictionary of dictionaries from a list.

synthaser.grouping.get_classification_paths(synthases)

Determines the hierarchy of synthase classifications.

This hierarchy is used when annotating the plot with classification bars. It should be used in conjunction with the per-classification synthase dictionary generated using group_synthases().

synthaser.grouping.group_synthases(synthases)

Group synthases by their classifications.

synthaser.grouping.iter_annotation_groups(hierarchy)

Traverses hierarchy and iterates classification groups.

Groups are reverse sorted by depth, such that annotations are drawn from more specific to less specific.

synthaser.grouping.iter_nested_keys(d, depth=0)

Iterates over all keys in a nested dictionary, reporting their depth.

The depth indicates how deeply nested the yielded key is in the dictionary. It is used when annotating the plot to determine the position of the classification bars.

synthaser.grouping.merge_dicts(a, b)

Recursively merges two dictionaries, allowing overlapping keys.