Prokka
======

Overview
--------

Prokka is a widely used command-line tool for rapid annotation of prokaryotic
genomes. It coordinates several external feature-prediction tools -- including
Prodigal for coding sequences, Aragorn for tRNAs, RNAmmer for rRNAs, and
SignalP for signal peptides -- and produces standardised output files suitable
for database submission and downstream analysis. Prokka is fast, typically
annotating a bacterial genome in under ten minutes, and requires no external
database download beyond its bundled reference data.

Installation
------------

.. code-block:: bash

   mamba install -c bioconda prokka

Basic Usage
-----------

Annotate a consensus assembly, specifying the output directory, file prefix,
and organism taxonomy.

.. code-block:: bash

   prokka consensus.fasta \
     --outdir prokka_output/ \
     --prefix sample \
     --genus Escherichia --species "coli" \
     --cpus 8

Key Parameters
--------------

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Flag / option
     - Description
   * - (positional)
     - Input genome assembly in FASTA format.
   * - ``--outdir``
     - Output directory for all annotation files.
   * - ``--prefix``
     - Prefix used for output file names (e.g. ``sample``).
   * - ``--genus`` / ``--species``
     - Organism taxonomy used for annotation and output metadata.
   * - ``--cpus``
     - Number of CPU threads to use.
   * - ``--kingdom``
     - Annotation kingdom: ``Bacteria`` (default), ``Archaea``, or
       ``Viruses``.
   * - ``--locustag``
     - Locus tag prefix for gene identifiers.
   * - ``--compliant``
     - Force GenBank/ENA/DDBJ compliance in output files.
   * - ``--rfam``
     - Enable searching for ncRNAs using the Rfam database (slower but
       more comprehensive).
   * - ``--proteins``
     - Path to a trusted protein FASTA or GenBank file for first-pass
       annotation against a custom database.

Expected Output
---------------

Prokka writes output files to the specified directory, all sharing the
prefix given with ``--prefix``:

* ``sample.gff`` -- annotations in GFF3 format with the genome sequence
  appended.
* ``sample.gbk`` -- annotations in GenBank format.
* ``sample.faa`` -- predicted protein sequences in FASTA format.
* ``sample.ffn`` -- nucleotide sequences of predicted features in FASTA
  format.
* ``sample.fna`` -- input genome sequence (may be re-named).
* ``sample.tsv`` -- a tab-separated summary of all annotated features.
* ``sample.txt`` -- a plain-text statistics summary with counts of each
  feature type.
* ``sample.log`` -- the Prokka log file with runtime details.

See Also
--------

* :doc:`bakta` -- newer prokaryotic annotator with a more comprehensive
  and regularly updated database
* :doc:`/tools/assembly/medaka` -- polishing step commonly run before
  annotation
* :doc:`/tools/assembly-qc/busco` -- assess gene-level completeness of the
  annotated assembly