synthaser
stable
  • User guide
  • API Documentation
    • synthaser.classify
    • synthaser.fasta
    • synthaser.grouping
    • synthaser.models
    • synthaser.ncbi
    • synthaser.plot
    • synthaser.results
    • synthaser.rpsblast
    • synthaser.search
synthaser
  • Docs »
  • API Documentation »
  • synthaser.search
  • Edit on GitHub

synthaser.search¶

This module serves as the starting point for synthaser, preparing input and dispatching it to either local or remote searches.

In any given search, input can either be a FASTA file or a collection of NCBI sequence identifiers. The prepare_input function is used to generate a SynthaseContainer object from either source which can then be used as a query. For example:

>>> sc1 = search.prepare_input(query_ids=["SEQ001.1", "SEQ002.1"])
>>> sc2 = search.prepare_input(query_file="my_sequences.fasta")

If query_ids are used, the sequences are first retrieved using NCBI Entrez using ncbi.efetch_sequences().

A full synthaser search can be performed using the search function. This prepares the input (ids or FASTA) as above, then launches local and remote searches using the ncbi and rpsblast modules, respectively. Results are then parsed using the results module, and classified using the classify module. Lastly, the SynthaseContainer object which was created inside this function is returned.

>>> sc = search.search(query_file="my_sequences.fasta")

To use custom domain and classification rules, simply provide the paths to each file to the search function:

>>> sc = search.search(
...     query_file="my_sequences.fasta",
...     domain_file="my_domains.json",
...     classify_file="my_rules.json",
... )

Previous searches are stored in the SEARCH_HISTORY variable, and can be summarised using the history function:

>>> ncbi.history()
1.      Run ID: QM3-qcdsearch-B4BAD4B59BC5B80-3E7CFCD3F93E21D0
    Parameters:
                    db: cdd
                 smode: auto
                useid1: true
          compbasedadj: 1
                filter: true
                evalue: 3.0
                maxhit: 500
                 dmode: full
                 tdata: hits

This module contains routines for performing local/remote searches.

synthaser.search.history()¶

Print out summary of previously saved CD-Search runs. :raises: ValueError – If SEARCH_HISTORY is empty (i.e. no searches have been run)

synthaser.search.prepare_input(query_ids=None, query_file=None)¶

Generate a SynthaseContainer from either query IDs or a query file.

Returns:Synthase objects for query sequences
Return type:SynthaseContainer
Raises:ValueError – Neither query_ids nor query_file provided
synthaser.search.search(mode='remote', query_ids=None, query_file=None, rule_file=None, classify_file=None, results_file=None, cdsid=None, delay=20, max_retries=-1, database=None, cpu=2, **kwargs)¶

Run a synthaser search.

CD-Search parameters can be given as kwargs which are passed on to _remote.

Parameters:
  • mode (str) – synthaser search mode (‘local’ or ‘remote’)
  • query_ids (str, file) – NCBI sequence identifiers to analyse
  • query_file (file) – Open FASTA file handle
  • rule_file (file) – Custom rule JSON file to use when parsing results
  • results_file (file) – Results file from a previous CDSearch/RPSBLAST search
  • cdsid (str) – CDSearch ID from a previous search
  • delay (int) – Time delay (s) between polling NCBI for results (def. 20)
  • max_retries (int) – Maximum number of polling attempts before exiting (def. -1)
  • database (str) – rpsblast database to use in local searches
  • cpu (int) – Number of threads to use in rpsblast
Returns:

Synthase objects representing query sequences

Return type:

SynthaseContainer

Raises:

ValueError – Too many sequences provided (NCBI limit = 4000)

Previous

© Copyright 2020, Cameron Gilchrist Revision d97a9887.

Built with Sphinx using a theme provided by Read the Docs.