Centrifuge is a rapid and memory-efficient classifier of DNA sequences from microbial samples. Centrifuge requires a relatively small genome index (e.g., 4.3 GB for ~4,100 bacterial genomes) and can process a typical DNA sequencing run within an hour. For more information, see the tool's website and GitHub repo.
Function Call
tc.centrifuge(
output_path=None,
tool_args="",
database_name="centrifuge_refseq_bacteria_archaea_viral_human",
database_version="1",
read_one=None,
read_two=None,
unpaired=None,
is_async=False,
)
Function Arguments
Argument | Use in place of: | Description |
---|---|---|
read_one |
-1 |
(optional) Path(s) to R1 of paired-end read input files. The files can be a local or remote, see Using Files. |
read_two |
-2 |
(optional) Path(s) to R2 of paired-end read input files. The files can be a local or remote, see Using Files. |
unpaired |
-U |
(optional) Path(s) to unpaired input files. The files can be a local or remote, see Using Files. |
output_path |
output arguments (-S , --report ) |
(optional) Path (directory) to where the output files will be downloaded. If omitted, skips download. The files can be a local or remote, see Using Files. |
tool_args |
all other arguments | (optional) Additional arguments to be passed to Centrifuge. This should be a string of arguments like the command line. See Supported Additional Arguments for more details. |
database_name |
-x * |
(optional) Name of database to use for Centrifuge classification. Defaults to "centrifuge_refseq_bacteria_archaea_viral_human" (Refseq bacteria / archaea / viral / human). |
database_version |
-x * |
(optional) Version of database to use for Centrifuge classification. Defaults to "1" . |
is_async |
Whether to run a job asynchronously. See Async Runs for more. |
*See the Databases section for more details.
Output Files
A Centrifuge run will output these files into output_path
:
centrifuge_output.txt
: Centrifuge output (captured fromstdout
), from the-S
argument.centrifuge_report.tsv
: Centrifuge report file, from the--report
argument.
Notes
Paired-end reads
For each paired-end input, make sure the corresponding read is in the same position in the input list. For example, two
pairs of paired-end files – one_R1.fastq
, one_R2.fastq
, two_R1.fastq
, two_R2.fastq
– should be passed to
Toolchest as:
tc.centrifuge(
read_one=["one_R1.fastq", "two_R1.fastq"],
read_two=["one_R2.fastq", "two_R2.fastq"],
...
)
Tool Versions
Toolchest currently supports version 1.0.4 of Centrifuge.
Databases
Toolchest currently supports the following databases for Bowtie 2:
database_name |
database_version |
Description |
---|---|---|
centrifuge_refseq_bacteria_archaea_viral_human |
1 |
RefSeq, bacteria / archaea / viral / human, JHU source1 |
1These database indexes were generated by the Langmead Lab at Johns Hopkins and can be found on the lab's database index page.
Supported Additional Arguments
Most additional arguments not related to input, output, or multithreading are supported:
-q
--qseq
-f
-r
-s
,--skip
-u
,--upto
-5
,--trim5
-3
,--trim3
--phred33
--phred64
--int-quals
--ignore-quals
--nofw
--norc
--min-hitlen
-k
--host-taxids
--exclude-taxids
--out-fmt
--tab-fmt-cols
-t
,--time
--qc-filter
--seed
--non-deterministic
Set additional arguments with tool_args
. For example: tool_args="-f -k 10"