Kraken 2
Kraken 2 is a fast and efficient tool for taxonomic sequence classification. For more information, see the tool's GitHub repo and wiki.
Function Call
toolchest.kraken2(
inputs,
read_one=None,
read_two=None,
output_path=None,
tool_args="",
database_name="standard",
database_version="1",
custom_database_path=None,
is_async=False,
)
toolchest$kraken2(
inputs,
read_one = NULL,
read_two = NULL,
output_path = NULL,
tool_args = "",
database_name = "standard",
database_version = "1",
custom_database_path = None,
is_async = FALSE
)
For a detailed line-by-line example, see the recipe below:
Function Arguments
Argument | Use in place of: | Description |
---|---|---|
| input file location | Path or list of paths (client-side) to be passed in as input(s). If using |
|
| (optional) Path to read 1 of paired-end read input files. This can be a local filepath or an AWS S3 URI. |
|
| (optional) Path to read 2 of paired-end read input files. This can be a local filepath or an AWS S3 URI. |
| output file location | (optional) Path to a directory where the output files will be downloaded. This can be a local filepath or an AWS S3 URI. |
| all other arguments | (optional) Additional arguments to be passed to Kraken 2. This should be a string of arguments like the command line. See Supported Additional Arguments for more details. |
|
| (optional) Name of database to use for Kraken 2 alignment. Defaults to |
|
| (optional) Version of database to use for Kraken 2 alignment. Defaults to |
|
| (optional) AWS S3 URI for a subfolder containing your custom database. |
| Whether to run a job asynchronously. |
*See the Databases section for more details.
Output Files
A Kraken 2 run will output 3 files into the directory specified by output_path
:
kraken2_output.txt
: Results outputted tostdout
.kraken2_report.txt
: Report generated by the--report
flag.kraken2_summary.txt
: Summary of sequences processed and classified, from Kraken 2's output tostderr
.
Return Value
This function call returns a Toolchest output object, which contains the run ID and locations of downloaded output files. See Output Objects for more details.
Notes
Amazon AWS S3 inputs
Publicly available files stored on AWS's S3 service can be passed in as inputs, using the file's S3 URI.
Amazon AWS S3 outputs
Toolchest can output directly to a custom S3 bucket, provided that Toolchest has permissions to write to the bucket. Once set up, the S3 URI of the bucket can be passed in as output_path
.
Async runs
Set the is_async
parameter to true if you would like to run a Kraken 2 job asynchronously. See Async Runs for more information.
Paired-end inputs
Paired-end read inputs can be provided either through inputs
or through read_one
and read_two
.
If using inputs
, use a list of two filepaths: inputs=['/path/to/read_1', '/path_to/read_2']
If using read_one
and read_two
, these will be interpreted as the input files over anything given in inputs
.
Custom database arguments
If using custom_database_path
, the given database will supersede any database selected via database_name
and database_version
.
Tool Versions
Toolchest currently supports version 2.1.1 of Kraken 2. Every request to run Kraken 2 with Toolchest will default to this version.
Databases
Toolchest currently supports the following databases for Kraken 2:
|
| Description |
---|---|---|
|
| RefSeq archaea, bacteria, viral, plasmid, human1, UniVec_Core1 |
|
| RefSeq fungi, generated on 11/20/2021 |
1This database index was generated by the Langmead Lab at Johns Hopkins and can be found on the lab's database index page.
Custom Databases
Toolchest supports custom databases hosted in AWS's S3 service, provided that they are accessible from Toolchest. Once permissioning is set up, use the function parameter custom_database_path
to specify the S3 URI of the folder containing the database index files.
Supported Additional Arguments
--bzip2-compressed
--confidence
--gzip-compressed
--minimum-base-quality
--minimum-hit-groups
--paired
--quick
--use-names
Additional arguments can be specified under the tool_args
argument.
Note: --paired
will automatically be added if using paired-end reads (specifying both read_one
and read_two
).
Updated about 2 months ago