synthaser.rpsblast

This module provides functionality for setting up and performing local synthaser searches using RPS-BLAST and rpsbproc. RPS-BLAST (Reverse PSI-BLAST) searches query sequences against databases of domain family profiles, and rpsbproc is used to post-process the raw results into something resembling results from an online CD-Search run. If synthaser cannot find either program on the system $PATH, it will raise an exception. For details on installing RPS-BLAST and rpsbproc, please refer to the user guide.

A basic search can be performed using the search function:

>>> rpsblast.search("sequences.fasta", "Cdd_LE", cpu=4)

This will automatically search the sequences in sequences.fasta against the Cdd_LE using RPS-BLAST and process the raw results using rpsbproc, resulting in Response object which be readily parsed like in a remote CD-Search.

A profile database can be downloaded using the download_database function, e.g.:

>>> path = rpsblast.download_database("my_folder", flavour="Cdd")

This will connect to the NCBI’s FTP and download the “Cdd” database (the complete database). The downloaded file will be a .tar archive, which can be extracted using untar:

>>> untarred_path = rpsblast.untar(path)

Alternatively, just use getdb to do both steps at once:

>>> rpsblast.getdb("Cdd", "myfolder")
synthaser.rpsblast.get_program_path(program)

Get full path to a program on system PATH.

synthaser.rpsblast.rpsblast(query, database, cpu=2)

Run rpsblast on a query file against a database.

synthaser.rpsblast.rpsbproc(results)

Convert raw rpsblast results into CD-Search results using rpsbproc.

Note that since rpsbproc is reliant upon data files that generally are installed in the same directory as the executable (and synthaser makes no provisions for them being stored elsewhere), we must make sure we have the full path to the original executable. If it is called via e.g. symlink, rpsbproc will not find the data files it requires and throw an error.

The CompletedProcess returned by this function contains a standard CD-Search results file, able to be parsed directly by the results module.

synthaser.rpsblast.search(query, database, cpu=2)

Convenience function for running rpsblast and rpsbproc.