Toolchest-wrapped Command-line Software
Note: if you haven't already, make sure you have an API key and Toolchest is installed!
The most popular bioinformatics software is run through the command line. Toolchest wraps this software in Python and runs it on the cloud.
A quick start
To get started, we'll use STAR, but you can use any of the packages supported by Toolchest . On the command-line, running STAR looks like:
STAR --outFileNamePrefix ./output_path --genomeDir ./database_CRCh38 --readFilesIn ./inputs/
With Toolchest, it's:
import toolchest-client as tc
tc.set_key("YOUR_KEY")
tc.STAR(
read_one="s3://toolchest-demo-data/SRR2557119_small.fastq",
output_path="./output_path/",
database_name="GRCh38",
)
and it runs in the cloud! Breaking down the arguments:
read_one
is for input files. They can be on your computer, or somewhere else like S3.output_path
is where your output files are written. This can also be your computer, or somewhere else like S3.database_name
is the name of the Toolchest-hosted database.
Adding more options
import toolchest-client as tc
tc.set_key("YOUR_KEY")
tc.STAR(
read_one="s3://toolchest-demo-data/SRR2557119_small.fastq",
output_path="./output/",
database_name="GRCh38",
database_version="1",
tool_args="--outSAMtype BAM Unsorted"
)
We added two new arguments:
- database_version
is the version number of the Toolchest-hosted database.
- tool_args
are the arguments that you would normally set on the command-line to customize execution.
Next, let's learn more about what kinds of files you can use with Toolchest.