Miscellaneous modules

synthaser also provides a few other modules to help you generate certain files.

getdb

The getdb module can be used to download pre-formatted RPS-BLAST databases for local searches. This module will connect to the NCBI FTP and download whichever database/s you specify. For example, to download the CDD to some folder databases:

synthaser getdb Cdd databases/

usage: synthaser getdb [-h] {cdd_families,Cdd,Cdd_NCBI,Cog,Kog,Pfam,Prk,Smart,Tigr} folder

Download a pre-formatted rpsblast database.

For full description of the available databases, see:
 https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSource

Note that 'cdd_families' will download a file containing a summary of
all families in the CDD for rule building - not a searchable database.

positional arguments:
  {cdd_families,Cdd,Cdd_NCBI,Cog,Kog,Pfam,Prk,Smart,Tigr}
                        Database to be downloaded
  folder                Folder where database is to be saved. Will save a .tar.gz file, and extract its
                        contents to a folder of the same name.

optional arguments:
  -h, --help            show this help message and exit

getseq

The getseq module can be used to download sequences, in FASTA format, from the NCBI. You can provide either a text file containing newline separated NCBI identifiers, or directly in the command line separated by spaces. For example:

synthaser getseq KAF4294870.1 KAF4294328.1 KAF4293514.1 -o sequences.fasta

usage: synthaser getseq [-h] [-o [OUTPUT]] sequence_ids [sequence_ids ...]

Download sequences from NCBI in FASTA format. This utility will accept either a file containing newline
separated sequence identifiers, or directly on the command line separated by spaces.

positional arguments:
  sequence_ids          Collection of NCBI sequence identifiers to retrieve

optional arguments:
  -h, --help            show this help message and exit
  -o [OUTPUT], --output [OUTPUT]
                        Where to print output (def. stdout)

extract

The extract module can be used to extract domain/query sequences from synthaser search results. It takes a JSON file (generated by -json/--json_file) and a prefix string which is used for the generated output files, as well as several filters.

For example, to extract KS, A and TE domain sequences:

$ synthaser extract session.json out_ --types KS A TE -m domain
Output: out_KS.faa out_A.faa out_TE.faa

To extract full NRPS and non-reducing PKS sequences:

$ synthaser extract session.json out_ \
    --mode synthase \
    --classes Non-reducing NRPS
Output: out_Non-reducing.faa out_NRPS.faa

Or to extract PKS_KS domains (CDD) only from highly-reducing PKSs:

$ synthaser extract session.json out_ \
    --families PKS_KS \
    --classes Highly-reducing
Output: out_PKS_KS.faa
usage: synthaser extract [-h] [-m {domain,synthase}] [--types TYPES [TYPES ...]]
                         [--classes CLASSES [CLASSES ...]] [--families FAMILIES [FAMILIES ...]]
                         session prefix

Extract domain/synthase sequences from search results.

positional arguments:
  session               Synthaser session file
  prefix                Output file prefix

optional arguments:
  -h, --help            show this help message and exit
  -m {domain,synthase}, --mode {domain,synthase}
                        Extract domain sequences or whole synthases from a session file
  --types TYPES [TYPES ...]
                        Domain types
  --classes CLASSES [CLASSES ...]
                        Sequence classifications
  --families FAMILIES [FAMILIES ...]
                        CDD families

genbank

The genbank module allows you extract protein sequences from a given GenBank format file. For example:

synthaser genbank myfile.gbk

will extract all identified protein sequences and print them to the terminal.

As a convenience for fungal megasynthase analysis, we provide the --antismash flag, which allows you to extract PKS/NRPS sequences directly from a GenBank file generated by antiSMASH.

usage: synthaser genbank [-h] [--antismash] genbank

Extract protein sequences from GenBank files. To extract PKS or NRPS sequences from antiSMASH GenBank
files, use the --antismash option.

positional arguments:
  genbank      GenBank file

optional arguments:
  -h, --help   show this help message and exit
  --antismash  Extract PKS/NRPS sequences from an antiSMASH file