Skip to content

Centrifuge is a rapid and memory-efficient classifier of DNA sequences from microbial samples. Centrifuge requires a relatively small genome index (e.g., 4.3 GB for ~4,100 bacterial genomes) and can process a typical DNA sequencing run within an hour. For more information, see the tool's website and GitHub repo.

Function Call

tc.centrifuge(
    output_path=None,
    tool_args="",
    database_name="centrifuge_refseq_bacteria_archaea_viral_human",
    database_version="1",
    read_one=None,
    read_two=None,
    unpaired=None,
    is_async=False,
)

Function Arguments

Argument Use in place of: Description
read_one -1 (optional) Path(s) to R1 of paired-end read input files. The files can be a local or remote, see Using Files.
read_two -2 (optional) Path(s) to R2 of paired-end read input files. The files can be a local or remote, see Using Files.
unpaired -U (optional) Path(s) to unpaired input files. The files can be a local or remote, see Using Files.
output_path output arguments (-S, --report) (optional) Path (directory) to where the output files will be downloaded. If omitted, skips download. The files can be a local or remote, see Using Files.
tool_args all other arguments (optional) Additional arguments to be passed to Centrifuge. This should be a string of arguments like the command line. See Supported Additional Arguments for more details.
database_name -x* (optional) Name of database to use for Centrifuge classification. Defaults to "centrifuge_refseq_bacteria_archaea_viral_human" (Refseq bacteria / archaea / viral / human).
database_version -x* (optional) Version of database to use for Centrifuge classification. Defaults to "1".
is_async Whether to run a job asynchronously. See Async Runs for more.

*See the Databases section for more details.

Output Files

A Centrifuge run will output these files into output_path:

  • centrifuge_output.txt: Centrifuge output (captured from stdout), from the -S argument.
  • centrifuge_report.tsv: Centrifuge report file, from the --report argument.

Notes

Paired-end reads

For each paired-end input, make sure the corresponding read is in the same position in the input list. For example, two pairs of paired-end files – one_R1.fastq, one_R2.fastq, two_R1.fastq, two_R2.fastq – should be passed to Toolchest as:

tc.centrifuge(
  read_one=["one_R1.fastq", "two_R1.fastq"],
  read_two=["one_R2.fastq", "two_R2.fastq"],
  ...
)

Tool Versions

Toolchest currently supports version 1.0.4 of Centrifuge.

Databases

Toolchest currently supports the following databases for Bowtie 2:

database_name database_version Description
centrifuge_refseq_bacteria_archaea_viral_human 1 RefSeq, bacteria / archaea / viral / human, JHU source1

1These database indexes were generated by the Langmead Lab at Johns Hopkins and can be found on the lab's database index page.

Supported Additional Arguments

Most additional arguments not related to input, output, or multithreading are supported:

  • -q
  • --qseq
  • -f
  • -r
  • -s, --skip
  • -u, --upto
  • -5, --trim5
  • -3, --trim3
  • --phred33
  • --phred64
  • --int-quals
  • --ignore-quals
  • --nofw
  • --norc
  • --min-hitlen
  • -k
  • --host-taxids
  • --exclude-taxids
  • --out-fmt
  • --tab-fmt-cols
  • -t, --time
  • --qc-filter
  • --seed
  • --non-deterministic

Set additional arguments with tool_args. For example: tool_args="-f -k 10"