synthaser.models
¶
This module stores the classes used throughout synthaser.
The Domain class represents a conserved domain hit. It stores the broader domain type, the specific conserved domain profile name (from CDD), as well as its position in its parent synthase sequence and score from the search. It also provides methods for slicing the corresponding sequence and serialisation. We can instantiate a Domain object like so:
>>> from synthaser.models import Domain
>>> domain = Domain(
... type='KS',
... domain='PKS_KS',
... start=756,
... end=1178,
... evalue=0.0,
... bitscore=300
... )
and get its sequence given the parent Synthase object sequence:
>>> domain.slice(synthase.sequence)
'MPIAVGM..'
Likewise, the Synthase class stores information about a synthase, including its name, amino acid sequence, Domain instances and its classification. It also contains methods for generating the domain architecture, extraction of domain sequences and more. For example, we can instantiate a new Synthase object like so:
>>> from synthaser.models import Synthase
>>> synthase = Synthase(
... header='SEQ001.1',
... sequence='MASGTC...',
... domains=[
... Domain(type='KS'),
... Domain(type='AT'),
... Domain(type='DH'),
... Domain(type='ER'),
... Domain(type='KR'),
... Domain(type='ACP'),
... ],
... )
Then, we can generate the domain architecture:
>>> synthase.architecture
'KS-AT-DH-ER-KR-ACP'
Or extract all of the domain sequences:
>>> synthase.extract_domains()
{
"KS_0": "MPIAVGM...",
"AT_0": "VFTGQGA...",
"DH_0": "DLLGVPV...",
"ER_0": "DVEIQVS...",
"KR_0": "IAENMCS...",
"ACP_0": "ASTTVAQ..."
}
The object can also be serialised to JSON (note the Domain object works the same way):
>>> js = synthase.to_json()
>>> with open('synthase.json', 'w') as handle:
... handle.write(js)
and subsequently loaded from JSON:
>>> with open('synthase.json') as handle:
... synthase = Synthase.from_json(handle)
This will internally convert the Synthase object, as well as any Domain objects it contains, to dictionaries, before converting to JSON using the builtin json library and writing to file. When loading up from JSON, this process is reversed, and the entries in the file are converted back to Python objects.
-
class
synthaser.models.
Domain
(pssm=None, type=None, domain=None, start=None, end=None, evalue=None, bitscore=None, accession=None, superfamily=None)¶ A conserved domain hit.
-
type
¶ Broader domain type (e.g. KS)
Type: str
-
domain
¶ Specific CDD family (e.g. PKS_KS)
Type: str
-
start
¶ Start of domain hit in parent sequence
Type: int
-
end
¶ End of domain hit in parent sequence
Type: int
-
evalue
¶ Domain hit E-value
Type: float
-
bitscore
¶ Domain hit bitscore
Type: float
-
accession
¶ CDD accession of domain family
Type: str
-
superfamily
¶ CDD accession of domain superfamily
Type: str
-
slice
(sequence)¶ Slices segment of sequence using the position of this Domain.
Given a Domain:
>>> domain = Domain(type='KS', subtype='PKS_KS', start=10, end=20)
And its corresponding Synthase sequence:
>>> synthase.sequence 'ACGTACGTACACGTACGTACACGTACGTAC'
We can extract the Domain:
>>> domain.slice(synthase.sequence) 'CGTACGTACA'
-
-
class
synthaser.models.
Synthase
(header=None, sequence=None, domains=None, classification=None)¶ The Synthase class stores a query protein sequence, its hit domains, and the methods for filtering and classifying.
-
header
¶ Synthase name.
Type: str
-
sequence
¶ Amino acid sequence of this Synthase.
Type: str
-
domains
¶ Conserved domain hits in this Synthase.
Type: list
-
classification
¶ All classification rules satisfied.
Type: list
-
contains
(classes=None, types=None, families=None)¶ Checks if Synthase contains given classifications, domain families or types.
-
extract_all_domains
()¶ Extracts all domain sequences from this synthase.
For example, given a Synthase:
>>> synthase = Synthase( ... header='synthase', ... sequence='ACGT...', # length 100 ... domains=[ ... Domain(type='KS', domain='PKS_KS', start=1, end=20), ... Domain(type='AT', domain='PKS_AT', start=50, end=70) ... ] ... )
Then, we can call this function to extract the domain sequences:
>>> synthase.extract_all_domains() {'KS':['ACGT...'], 'AT':['ACGT...']}
Returns: Sequences for each domain in this synthase keyed on domain type.
Return type: dict
Raises: ValueError
– If the Synthase has no Domain objects.ValueError
– If the sequence attribute is empty.
-
extract_domains
(types=None, families=None)¶ Extract specific domain type/family sequences from this Synthase.
-
-
class
synthaser.models.
SynthaseContainer
(synthases)¶ Simple container class for Synthase objects.
The purpose of this class is to facilitate batch actions on Synthase objects, i.e. serialisation, extraction of domain sequences, iteration over type/subtype, and printing summaries.
-
add_sequences
(sequences)¶ Add amino acid sequence to Synthase objects in this container.
-
append
(synthase)¶ S.append(value) – append value to the end of the sequence
-
extend
(synthases)¶ S.extend(iterable) – extend sequence by appending elements from the iterable
-
extract_domains
(classes=None, types=None, families=None, by='sequence')¶ Extract domain sequences from Synthase objects in this container.
For example, given a SynthaseContainer containing Synthase objects:
>>> synthases = [Synthase(header='one', ...), Synthase(header='two', ...)] >>> container = SynthaseContainer(synthases)
Then, the output of this function may resemble:
>>> container.extract_domains() {'KS': [('one_KS_1', 'IAIA...'), ('two_KS_1', 'IAIE...')], 'AT': [...]}
-
extract_synthases
(classes=None, types=None, families=None)¶ Bin entire synthase sequences.
-
classmethod
from_sequences
(sequences)¶ Build a SynthaseContainer from a dictionary of query sequences.
-
to_long
(delimiter=', ', headers=True)¶ Generate summary of the container in long data format.
For example:
Synthase Length (aa) Architecture Classification SEQ001.1 1000 KS-AT-DH-ER-KR-ACP PKS, Type I, Highly-reducing
NOTE: actual output is character delimited, not human readable.
-