Kraken2
=======

Overview
--------

Kraken2 is an ultrafast taxonomic classification tool for metagenomic
sequencing reads. It assigns a taxonomic label to each read by matching exact
k-mer sequences against a pre-built database of known genomes. Kraken2 uses a
compact hash table that maps k-mers to the lowest common ancestor (LCA) of all
genomes containing that k-mer, achieving classification speeds of millions of
reads per minute. It is the standard first step in many metagenomics workflows
for determining the taxonomic composition of a sample.

Installation
------------

.. code-block:: bash

   mamba install -c bioconda kraken2

Basic Usage
-----------

Download a pre-built standard database and classify sequencing reads.

.. code-block:: bash

   # Download pre-built database
   wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz
   mkdir -p kraken2_db && tar -xzf k2_standard_20240112.tar.gz -C kraken2_db/

   # Classify reads
   kraken2 --db kraken2_db/ \
     --output classifications.txt \
     --report report.txt \
     --minimum-hit-groups 3 \
     --threads 8 \
     reads.fastq.gz

For paired-end reads, add the ``--paired`` flag and provide both FASTQ files:

.. code-block:: bash

   kraken2 --db kraken2_db/ --paired \
     --output classifications.txt \
     --report report.txt \
     --minimum-hit-groups 3 \
     --threads 8 \
     reads_R1.fastq.gz reads_R2.fastq.gz

Key Parameters
--------------

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Flag / option
     - Description
   * - ``--db``
     - Path to the Kraken2 database directory.
   * - ``--output``
     - Per-read classification output file (read ID, taxon ID, k-mer mapping
       details).
   * - ``--report``
     - Summary report with read counts and percentages at each taxonomic level.
   * - ``--minimum-hit-groups``
     - Minimum number of hit groups needed to make a classification call.
       Higher values improve precision at the cost of sensitivity.
   * - ``--confidence``
     - Confidence score threshold (0--1) for classification. Reads below this
       threshold are marked unclassified.
   * - ``--paired``
     - Input reads are paired-end.
   * - ``--threads``
     - Number of threads for classification.
   * - ``--unclassified-out``
     - Write unclassified reads to the specified file.
   * - ``--classified-out``
     - Write classified reads to the specified file.

Expected Output
---------------

* ``classifications.txt`` -- tab-delimited file with one line per read,
  containing the classification status (C/U), read ID, assigned taxon ID,
  read length, and k-mer-to-taxon mapping.
* ``report.txt`` -- Kraken-style report with columns for percentage of reads,
  number of reads rooted at a taxon, number of reads directly assigned, rank
  code, taxon ID, and taxon name. This report is used as input for Bracken
  abundance estimation and Krona visualisation.

See Also
--------

* :doc:`bracken` -- re-estimates species-level abundance from Kraken2 reports
  using Bayesian methods
* :doc:`krona` -- generates interactive HTML taxonomic visualisations from
  Kraken2 reports