Kraken 2 is a fast and efficient tool for taxonomic sequence classification. For more information, see the tool's GitHub repo and wiki.
Function Call
tc.kraken2(
inputs,
read_one=None,
read_two=None,
output_path=None,
tool_args="",
database_name="standard",
database_version="1",
remote_database_path=None,
is_async=False,
)
Function Arguments
Argument | Use in place of: | Description |
---|---|---|
inputs |
input file location | Path to one or more files to use as input. If using read_one or read_two , this can be omitted. The files can be a local or remote, see Using Files. |
read_one |
--paired , input file location |
(optional) Path to R1 of paired-end read input files. The file can be a local or remote, see Using Files. |
read_two |
--paired , input file location |
(optional) Path to R2 of paired-end read input files. The file can be a local or remote, see Using Files. |
output_path |
output file location | (optional) Path (directory) to where the output files will be downloaded. If omitted, skips download. The files can be a local or remote, see Using Files. |
tool_args |
all other arguments | (optional) Additional arguments to be passed to Kraken 2. This should be a string of arguments like the command line. See Supported Additional Arguments for more details. |
database_name |
--db |
(optional) Name of database to use for Kraken 2 alignment. Defaults to "standard" . |
database_version |
-db |
(optional) Version of database to use for Kraken 2 alignment. Defaults to "1" . |
remote_database_path |
-db |
(optional) AWS S3 URI to a directory with your custom database. |
is_async |
Whether to run a job asynchronously. See Async Runs for more. |
See the Databases section for more details.
Output Files
A Kraken 2 run will output 3 files into output_path
:
kraken2_output.txt
: Results outputted tostdout
.kraken2_report.txt
: Report generated by the--report
flag.kraken2_summary.txt
: Summary of sequences processed and classified, from Kraken 2's output tostderr
.
Notes
Paired-end inputs
Paired-end read inputs can be set with either inputs
or through read_one
and read_two
.
If using inputs
, use a list of two filepaths: inputs=['/path/to/read_1', '/path_to/read_2']
If using read_one
and read_two
, these take priority over inputs
.
Custom database arguments
If using custom_database_path
, the given database will take priority over any database selected via database_name
and database_version
.
Tool Versions
Toolchest currently supports version 2.1.1 of Kraken 2.
Databases
Toolchest currently supports the following databases for Kraken 2:
database_name |
database_version |
Description |
---|---|---|
standard |
1 |
RefSeq archaea, bacteria, viral, plasmid, human1, UniVec_Core1 |
refseq_fungi |
20211120 |
RefSeq fungi, generated on 11/20/2021 |
1This database index was generated by the Langmead Lab at Johns Hopkins and can be found on the lab's database index page.
Custom Databases
Toolchest supports custom databases hosted in S3, so long as they are accessible from Toolchest. Use the argument custom_database_path
to set the S3 URI of the folder with the database index files.
Supported Additional Arguments
--bzip2-compressed
--confidence
--gzip-compressed
--minimum-base-quality
--minimum-hit-groups
--paired
--quick
--use-names
Additional arguments can be specified under the tool_args
argument.
Note: --paired
will automatically be added if using paired-end reads (specifying both read_one
and read_two
).