Kraken 2 is a fast and efficient tool for taxonomic sequence classification. For more information, see the tool's GitHub repo and wiki.

Function Call

toolchest.kraken2(
    inputs,
    read_one=None,
    read_two=None,
    output_path=None,
    tool_args="",
    database_name="standard",
    database_version="1",
    custom_database_path=None,
    is_async=False,
)
toolchest$kraken2(
    inputs,
    read_one = NULL,
    read_two = NULL,
    output_path = NULL,
    tool_args = "",
    database_name = "standard",
    database_version = "1",
    custom_database_path = None,
    is_async = FALSE
)

For a detailed line-by-line example, see the recipe below:

Function Arguments

Argument

Use in place of:

Description

inputs

input file location

Path or list of paths (client-side) to be passed in as input(s). If using read_one or read_two, this can be omitted.

read_one

--paired, input file location

(optional) Path to read 1 of paired-end read input files. This can be a local filepath or an AWS S3 URI.

read_two

--paired, input file location

(optional) Path to read 2 of paired-end read input files. This can be a local filepath or an AWS S3 URI.

output_path

output file location

(optional) Path to a directory where the output files will be downloaded. This can be a local filepath or an AWS S3 URI.
If omitted, skips download. The outputs can be downloaded manually.

tool_args

all other arguments

(optional) Additional arguments to be passed to Kraken 2. This should be a string of arguments like the command line. See Supported Additional Arguments for more details.

database_name

--db*

(optional) Name of database to use for Kraken 2 alignment. Defaults to "standard".

database_version

-db*

(optional) Version of database to use for Kraken 2 alignment. Defaults to "1".

custom_database_path

-db*

(optional) AWS S3 URI for a subfolder containing your custom database.

is_async

Whether to run a job asynchronously.

*See the Databases section for more details.

Output Files

A Kraken 2 run will output 3 files into the directory specified by output_path:

  • kraken2_output.txt: Results outputted to stdout.
  • kraken2_report.txt: Report generated by the --report flag.
  • kraken2_summary.txt: Summary of sequences processed and classified, from Kraken 2's output to stderr.

Return Value

This function call returns a Toolchest output object, which contains the run ID and locations of downloaded output files. See Output Objects for more details.

Notes

Amazon AWS S3 inputs

Publicly available files stored on AWS's S3 service can be passed in as inputs, using the file's S3 URI.

Amazon AWS S3 outputs

Toolchest can output directly to a custom S3 bucket, provided that Toolchest has permissions to write to the bucket. Once set up, the S3 URI of the bucket can be passed in as output_path.

Async runs

Set the is_async parameter to true if you would like to run a Kraken 2 job asynchronously. See Async Runs for more information.

Paired-end inputs

Paired-end read inputs can be provided either through inputs or through read_one and read_two.

If using inputs, use a list of two filepaths: inputs=['/path/to/read_1', '/path_to/read_2']

If using read_one and read_two, these will be interpreted as the input files over anything given in inputs.

Custom database arguments

If using custom_database_path, the given database will supersede any database selected via database_name and database_version.

Tool Versions

Toolchest currently supports version 2.1.1 of Kraken 2. Every request to run Kraken 2 with Toolchest will default to this version.

Databases

Toolchest currently supports the following databases for Kraken 2:

database_name

database_version

Description

standard

1

RefSeq archaea, bacteria, viral, plasmid, human1, UniVec_Core1

refseq_fungi

20211120

RefSeq fungi, generated on 11/20/2021

1This database index was generated by the Langmead Lab at Johns Hopkins and can be found on the lab's database index page.

Custom Databases

Toolchest supports custom databases hosted in AWS's S3 service, provided that they are accessible from Toolchest. Once permissioning is set up, use the function parameter custom_database_path to specify the S3 URI of the folder containing the database index files.

Supported Additional Arguments

  • --bzip2-compressed
  • --confidence
  • --gzip-compressed
  • --minimum-base-quality
  • --minimum-hit-groups
  • --paired
  • --quick
  • --use-names

Additional arguments can be specified under the tool_args argument.

Note: --paired will automatically be added if using paired-end reads (specifying both read_one and read_two).


Did this page help you?