Miscellaneous modules¶
synthaser also provides a few other modules to help you generate certain files.
getdb
¶
The getdb
module can be used to download pre-formatted RPS-BLAST databases for local
searches. This module will connect to the NCBI FTP and download whichever database/s you
specify. For example, to download the CDD to some folder databases
:
synthaser getdb Cdd databases/
usage: synthaser getdb [-h] {cdd_families,Cdd,Cdd_NCBI,Cog,Kog,Pfam,Prk,Smart,Tigr} folder
Download a pre-formatted rpsblast database.
For full description of the available databases, see:
https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSource
Note that 'cdd_families' will download a file containing a summary of
all families in the CDD for rule building - not a searchable database.
positional arguments:
{cdd_families,Cdd,Cdd_NCBI,Cog,Kog,Pfam,Prk,Smart,Tigr}
Database to be downloaded
folder Folder where database is to be saved. Will save a .tar.gz file, and extract its
contents to a folder of the same name.
optional arguments:
-h, --help show this help message and exit
getseq
¶
The getseq
module can be used to download sequences, in FASTA format, from the NCBI.
You can provide either a text file containing newline separated NCBI identifiers, or
directly in the command line separated by spaces. For example:
synthaser getseq KAF4294870.1 KAF4294328.1 KAF4293514.1 -o sequences.fasta
usage: synthaser getseq [-h] [-o [OUTPUT]] sequence_ids [sequence_ids ...]
Download sequences from NCBI in FASTA format. This utility will accept either a file containing newline
separated sequence identifiers, or directly on the command line separated by spaces.
positional arguments:
sequence_ids Collection of NCBI sequence identifiers to retrieve
optional arguments:
-h, --help show this help message and exit
-o [OUTPUT], --output [OUTPUT]
Where to print output (def. stdout)
extract
¶
The extract
module can be used to extract domain/query sequences from synthaser
search results. It takes a JSON file (generated by -json/--json_file
) and a prefix
string which is used for the generated output files, as well as several filters.
For example, to extract KS, A and TE domain sequences:
$ synthaser extract session.json out_ --types KS A TE -m domain
Output: out_KS.faa out_A.faa out_TE.faa
To extract full NRPS and non-reducing PKS sequences:
$ synthaser extract session.json out_ \
--mode synthase \
--classes Non-reducing NRPS
Output: out_Non-reducing.faa out_NRPS.faa
Or to extract PKS_KS domains (CDD) only from highly-reducing PKSs:
$ synthaser extract session.json out_ \
--families PKS_KS \
--classes Highly-reducing
Output: out_PKS_KS.faa
usage: synthaser extract [-h] [-m {domain,synthase}] [--types TYPES [TYPES ...]]
[--classes CLASSES [CLASSES ...]] [--families FAMILIES [FAMILIES ...]]
session prefix
Extract domain/synthase sequences from search results.
positional arguments:
session Synthaser session file
prefix Output file prefix
optional arguments:
-h, --help show this help message and exit
-m {domain,synthase}, --mode {domain,synthase}
Extract domain sequences or whole synthases from a session file
--types TYPES [TYPES ...]
Domain types
--classes CLASSES [CLASSES ...]
Sequence classifications
--families FAMILIES [FAMILIES ...]
CDD families
genbank
¶
The genbank
module allows you extract protein sequences from a given GenBank format
file. For example:
synthaser genbank myfile.gbk
will extract all identified protein sequences and print them to the terminal.
As a convenience for fungal megasynthase analysis, we provide the --antismash
flag,
which allows you to extract PKS/NRPS sequences directly from a GenBank file generated by
antiSMASH.
usage: synthaser genbank [-h] [--antismash] genbank
Extract protein sequences from GenBank files. To extract PKS or NRPS sequences from antiSMASH GenBank
files, use the --antismash option.
positional arguments:
genbank GenBank file
optional arguments:
-h, --help show this help message and exit
--antismash Extract PKS/NRPS sequences from an antiSMASH file