Kraken2
Overview
Kraken2 is an ultrafast taxonomic classification tool for metagenomic sequencing reads. It assigns a taxonomic label to each read by matching exact k-mer sequences against a pre-built database of known genomes. Kraken2 uses a compact hash table that maps k-mers to the lowest common ancestor (LCA) of all genomes containing that k-mer, achieving classification speeds of millions of reads per minute. It is the standard first step in many metagenomics workflows for determining the taxonomic composition of a sample.
Installation
mamba install -c bioconda kraken2
Basic Usage
Download a pre-built standard database and classify sequencing reads.
# Download pre-built database
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz
mkdir -p kraken2_db && tar -xzf k2_standard_20240112.tar.gz -C kraken2_db/
# Classify reads
kraken2 --db kraken2_db/ \
--output classifications.txt \
--report report.txt \
--minimum-hit-groups 3 \
--threads 8 \
reads.fastq.gz
For paired-end reads, add the --paired flag and provide both FASTQ files:
kraken2 --db kraken2_db/ --paired \
--output classifications.txt \
--report report.txt \
--minimum-hit-groups 3 \
--threads 8 \
reads_R1.fastq.gz reads_R2.fastq.gz
Key Parameters
Flag / option |
Description |
|---|---|
|
Path to the Kraken2 database directory. |
|
Per-read classification output file (read ID, taxon ID, k-mer mapping details). |
|
Summary report with read counts and percentages at each taxonomic level. |
|
Minimum number of hit groups needed to make a classification call. Higher values improve precision at the cost of sensitivity. |
|
Confidence score threshold (0–1) for classification. Reads below this threshold are marked unclassified. |
|
Input reads are paired-end. |
|
Number of threads for classification. |
|
Write unclassified reads to the specified file. |
|
Write classified reads to the specified file. |
Expected Output
classifications.txt– tab-delimited file with one line per read, containing the classification status (C/U), read ID, assigned taxon ID, read length, and k-mer-to-taxon mapping.report.txt– Kraken-style report with columns for percentage of reads, number of reads rooted at a taxon, number of reads directly assigned, rank code, taxon ID, and taxon name. This report is used as input for Bracken abundance estimation and Krona visualisation.