synthaser.search
¶
This module serves as the starting point for synthaser, preparing input and dispatching it to either local or remote searches.
In any given search, input can either be a FASTA file or a collection of NCBI sequence identifiers. The prepare_input function is used to generate a SynthaseContainer object from either source which can then be used as a query. For example:
>>> sc1 = search.prepare_input(query_ids=["SEQ001.1", "SEQ002.1"])
>>> sc2 = search.prepare_input(query_file="my_sequences.fasta")
If query_ids are used, the sequences are first retrieved using NCBI Entrez using ncbi.efetch_sequences().
A full synthaser search can be performed using the search function. This prepares the input (ids or FASTA) as above, then launches local and remote searches using the ncbi and rpsblast modules, respectively. Results are then parsed using the results module, and classified using the classify module. Lastly, the SynthaseContainer object which was created inside this function is returned.
>>> sc = search.search(query_file="my_sequences.fasta")
To use custom domain and classification rules, simply provide the paths to each file to the search function:
>>> sc = search.search(
... query_file="my_sequences.fasta",
... domain_file="my_domains.json",
... classify_file="my_rules.json",
... )
Previous searches are stored in the SEARCH_HISTORY variable, and can be summarised using the history function:
>>> ncbi.history()
1. Run ID: QM3-qcdsearch-B4BAD4B59BC5B80-3E7CFCD3F93E21D0
Parameters:
db: cdd
smode: auto
useid1: true
compbasedadj: 1
filter: true
evalue: 3.0
maxhit: 500
dmode: full
tdata: hits
This module contains routines for performing local/remote searches.
-
synthaser.search.
history
()¶ Print out summary of previously saved CD-Search runs. :raises:
ValueError
– If SEARCH_HISTORY is empty (i.e. no searches have been run)
-
synthaser.search.
prepare_input
(query_ids=None, query_file=None)¶ Generate a SynthaseContainer from either query IDs or a query file.
Returns: Synthase objects for query sequences Return type: SynthaseContainer Raises: ValueError
– Neither query_ids nor query_file provided
-
synthaser.search.
search
(mode='remote', query_ids=None, query_file=None, rule_file=None, classify_file=None, results_file=None, cdsid=None, delay=20, max_retries=-1, database=None, cpu=2, **kwargs)¶ Run a synthaser search.
CD-Search parameters can be given as kwargs which are passed on to _remote.
Parameters: - mode (str) – synthaser search mode (‘local’ or ‘remote’)
- query_ids (str, file) – NCBI sequence identifiers to analyse
- query_file (file) – Open FASTA file handle
- rule_file (file) – Custom rule JSON file to use when parsing results
- results_file (file) – Results file from a previous CDSearch/RPSBLAST search
- cdsid (str) – CDSearch ID from a previous search
- delay (int) – Time delay (s) between polling NCBI for results (def. 20)
- max_retries (int) – Maximum number of polling attempts before exiting (def. -1)
- database (str) – rpsblast database to use in local searches
- cpu (int) – Number of threads to use in rpsblast
Returns: Synthase objects representing query sequences
Return type: Raises: ValueError
– Too many sequences provided (NCBI limit = 4000)