synthaser.models

This module stores the classes used throughout synthaser.

The Domain class represents a conserved domain hit. It stores the broader domain type, the specific conserved domain profile name (from CDD), as well as its position in its parent synthase sequence and score from the search. It also provides methods for slicing the corresponding sequence and serialisation. We can instantiate a Domain object like so:

>>> from synthaser.models import Domain
>>> domain = Domain(
...     type='KS',
...     domain='PKS_KS',
...     start=756,
...     end=1178,
...     evalue=0.0,
...     bitscore=300
... )

and get its sequence given the parent Synthase object sequence:

>>> domain.slice(synthase.sequence)
'MPIAVGM..'

Likewise, the Synthase class stores information about a synthase, including its name, amino acid sequence, Domain instances and its classification. It also contains methods for generating the domain architecture, extraction of domain sequences and more. For example, we can instantiate a new Synthase object like so:

>>> from synthaser.models import Synthase
>>> synthase = Synthase(
...     header='SEQ001.1',
...     sequence='MASGTC...',
...     domains=[
...         Domain(type='KS'),
...         Domain(type='AT'),
...         Domain(type='DH'),
...         Domain(type='ER'),
...         Domain(type='KR'),
...         Domain(type='ACP'),
...     ],
... )

Then, we can generate the domain architecture:

>>> synthase.architecture
'KS-AT-DH-ER-KR-ACP'

Or extract all of the domain sequences:

>>> synthase.extract_domains()
{
    "KS_0": "MPIAVGM...",
    "AT_0": "VFTGQGA...",
    "DH_0": "DLLGVPV...",
    "ER_0": "DVEIQVS...",
    "KR_0": "IAENMCS...",
    "ACP_0": "ASTTVAQ..."
}

The object can also be serialised to JSON (note the Domain object works the same way):

>>> js = synthase.to_json()
>>> with open('synthase.json', 'w') as handle:
...     handle.write(js)

and subsequently loaded from JSON:

>>> with open('synthase.json') as handle:
...     synthase = Synthase.from_json(handle)

This will internally convert the Synthase object, as well as any Domain objects it contains, to dictionaries, before converting to JSON using the builtin json library and writing to file. When loading up from JSON, this process is reversed, and the entries in the file are converted back to Python objects.

class synthaser.models.Domain(pssm=None, type=None, domain=None, start=None, end=None, evalue=None, bitscore=None, accession=None, superfamily=None)

A conserved domain hit.

type

Broader domain type (e.g. KS)

Type:str
domain

Specific CDD family (e.g. PKS_KS)

Type:str
start

Start of domain hit in parent sequence

Type:int
end

End of domain hit in parent sequence

Type:int
evalue

Domain hit E-value

Type:float
bitscore

Domain hit bitscore

Type:float
accession

CDD accession of domain family

Type:str
superfamily

CDD accession of domain superfamily

Type:str
slice(sequence)

Slices segment of sequence using the position of this Domain.

Given a Domain:

>>> domain = Domain(type='KS', subtype='PKS_KS', start=10, end=20)

And its corresponding Synthase sequence:

>>> synthase.sequence
'ACGTACGTACACGTACGTACACGTACGTAC'

We can extract the Domain:

>>> domain.slice(synthase.sequence)
'CGTACGTACA'
class synthaser.models.Synthase(header=None, sequence=None, domains=None, classification=None)

The Synthase class stores a query protein sequence, its hit domains, and the methods for filtering and classifying.

header

Synthase name.

Type:str
sequence

Amino acid sequence of this Synthase.

Type:str
domains

Conserved domain hits in this Synthase.

Type:list
classification

All classification rules satisfied.

Type:list
contains(classes=None, types=None, families=None)

Checks if Synthase contains given classifications, domain families or types.

extract_all_domains()

Extracts all domain sequences from this synthase.

For example, given a Synthase:

>>> synthase = Synthase(
...     header='synthase',
...     sequence='ACGT...',  # length 100
...     domains=[
...         Domain(type='KS', domain='PKS_KS', start=1, end=20),
...         Domain(type='AT', domain='PKS_AT', start=50, end=70)
...     ]
... )

Then, we can call this function to extract the domain sequences:

>>> synthase.extract_all_domains()
{'KS':['ACGT...'], 'AT':['ACGT...']}
Returns:

Sequences for each domain in this synthase keyed on domain type.

Return type:

dict

Raises:
  • ValueError – If the Synthase has no Domain objects.
  • ValueError – If the sequence attribute is empty.
extract_domains(types=None, families=None)

Extract specific domain type/family sequences from this Synthase.

class synthaser.models.SynthaseContainer(synthases)

Simple container class for Synthase objects.

The purpose of this class is to facilitate batch actions on Synthase objects, i.e. serialisation, extraction of domain sequences, iteration over type/subtype, and printing summaries.

add_sequences(sequences)

Add amino acid sequence to Synthase objects in this container.

append(synthase)

S.append(value) – append value to the end of the sequence

extend(synthases)

S.extend(iterable) – extend sequence by appending elements from the iterable

extract_domains(classes=None, types=None, families=None, by='sequence')

Extract domain sequences from Synthase objects in this container.

For example, given a SynthaseContainer containing Synthase objects:

>>> synthases = [Synthase(header='one', ...), Synthase(header='two', ...)]
>>> container = SynthaseContainer(synthases)

Then, the output of this function may resemble:

>>> container.extract_domains()
{'KS': [('one_KS_1', 'IAIA...'), ('two_KS_1', 'IAIE...')], 'AT': [...]}
extract_synthases(classes=None, types=None, families=None)

Bin entire synthase sequences.

classmethod from_sequences(sequences)

Build a SynthaseContainer from a dictionary of query sequences.

to_long(delimiter=', ', headers=True)

Generate summary of the container in long data format.

For example:

Synthase  Length (aa)  Architecture        Classification
SEQ001.1  1000         KS-AT-DH-ER-KR-ACP  PKS, Type I, Highly-reducing

NOTE: actual output is character delimited, not human readable.