Skip to content

Kraken 2 is a fast and efficient tool for taxonomic sequence classification. For more information, see the tool's GitHub repo and wiki.

Function Call

tc.kraken2(
    inputs,
    read_one=None,
    read_two=None,
    output_path=None,
    tool_args="",
    database_name="standard",
    database_version="1",
    remote_database_path=None,
    is_async=False,
)

Function Arguments

Argument Use in place of: Description
inputs input file location Path to one or more files to use as input. If using read_one or read_two, this can be omitted. The files can be a local or remote, see Using Files.
read_one --paired, input file location (optional) Path to R1 of paired-end read input files. The file can be a local or remote, see Using Files.
read_two --paired, input file location (optional) Path to R2 of paired-end read input files. The file can be a local or remote, see Using Files.
output_path output file location (optional) Path (directory) to where the output files will be downloaded. If omitted, skips download. The files can be a local or remote, see Using Files.
tool_args all other arguments (optional) Additional arguments to be passed to Kraken 2. This should be a string of arguments like the command line. See Supported Additional Arguments for more details.
database_name --db (optional) Name of database to use for Kraken 2 alignment. Defaults to "standard".
database_version -db (optional) Version of database to use for Kraken 2 alignment. Defaults to "1".
remote_database_path -db (optional) AWS S3 URI to a directory with your custom database.
is_async Whether to run a job asynchronously. See Async Runs for more.

See the Databases section for more details.

Output Files

A Kraken 2 run will output 3 files into output_path:

  • kraken2_output.txt: Results outputted to stdout.
  • kraken2_report.txt: Report generated by the --report flag.
  • kraken2_summary.txt: Summary of sequences processed and classified, from Kraken 2's output to stderr.

Notes

Paired-end inputs

Paired-end read inputs can be set with either inputs or through read_one and read_two.

If using inputs, use a list of two filepaths: inputs=['/path/to/read_1', '/path_to/read_2']

If using read_one and read_two, these take priority over inputs.

Custom database arguments

If using custom_database_path, the given database will take priority over any database selected via database_name and database_version.

Tool Versions

Toolchest currently supports version 2.1.1 of Kraken 2.

Databases

Toolchest currently supports the following databases for Kraken 2:

database_name database_version Description
standard 1 RefSeq archaea, bacteria, viral, plasmid, human1, UniVec_Core1
refseq_fungi 20211120 RefSeq fungi, generated on 11/20/2021

1This database index was generated by the Langmead Lab at Johns Hopkins and can be found on the lab's database index page.

Custom Databases

Toolchest supports custom databases hosted in S3, so long as they are accessible from Toolchest. Use the argument custom_database_path to set the S3 URI of the folder with the database index files.

Supported Additional Arguments

  • --bzip2-compressed
  • --confidence
  • --gzip-compressed
  • --minimum-base-quality
  • --minimum-hit-groups
  • --paired
  • --quick
  • --use-names

Additional arguments can be specified under the tool_args argument.

Note: --paired will automatically be added if using paired-end reads (specifying both read_one and read_two).