Kraken2

Overview

Kraken2 is an ultrafast taxonomic classification tool for metagenomic sequencing reads. It assigns a taxonomic label to each read by matching exact k-mer sequences against a pre-built database of known genomes. Kraken2 uses a compact hash table that maps k-mers to the lowest common ancestor (LCA) of all genomes containing that k-mer, achieving classification speeds of millions of reads per minute. It is the standard first step in many metagenomics workflows for determining the taxonomic composition of a sample.

Installation

mamba install -c bioconda kraken2

Basic Usage

Download a pre-built standard database and classify sequencing reads.

# Download pre-built database
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz
mkdir -p kraken2_db && tar -xzf k2_standard_20240112.tar.gz -C kraken2_db/

# Classify reads
kraken2 --db kraken2_db/ \
  --output classifications.txt \
  --report report.txt \
  --minimum-hit-groups 3 \
  --threads 8 \
  reads.fastq.gz

For paired-end reads, add the --paired flag and provide both FASTQ files:

kraken2 --db kraken2_db/ --paired \
  --output classifications.txt \
  --report report.txt \
  --minimum-hit-groups 3 \
  --threads 8 \
  reads_R1.fastq.gz reads_R2.fastq.gz

Key Parameters

Flag / option	Description
`--db`	Path to the Kraken2 database directory.
`--output`	Per-read classification output file (read ID, taxon ID, k-mer mapping details).
`--report`	Summary report with read counts and percentages at each taxonomic level.
`--minimum-hit-groups`	Minimum number of hit groups needed to make a classification call. Higher values improve precision at the cost of sensitivity.
`--confidence`	Confidence score threshold (0–1) for classification. Reads below this threshold are marked unclassified.
`--paired`	Input reads are paired-end.
`--threads`	Number of threads for classification.
`--unclassified-out`	Write unclassified reads to the specified file.
`--classified-out`	Write classified reads to the specified file.

Expected Output

classifications.txt – tab-delimited file with one line per read, containing the classification status (C/U), read ID, assigned taxon ID, read length, and k-mer-to-taxon mapping.
report.txt – Kraken-style report with columns for percentage of reads, number of reads rooted at a taxon, number of reads directly assigned, rank code, taxon ID, and taxon name. This report is used as input for Bracken abundance estimation and Krona visualisation.