Prokka

Overview

Prokka is a widely used command-line tool for rapid annotation of prokaryotic genomes. It coordinates several external feature-prediction tools – including Prodigal for coding sequences, Aragorn for tRNAs, RNAmmer for rRNAs, and SignalP for signal peptides – and produces standardised output files suitable for database submission and downstream analysis. Prokka is fast, typically annotating a bacterial genome in under ten minutes, and requires no external database download beyond its bundled reference data.

Installation

mamba install -c bioconda prokka

Basic Usage

Annotate a consensus assembly, specifying the output directory, file prefix, and organism taxonomy.

prokka consensus.fasta \
  --outdir prokka_output/ \
  --prefix sample \
  --genus Escherichia --species "coli" \
  --cpus 8

Key Parameters

Flag / option	Description
(positional)	Input genome assembly in FASTA format.
`--outdir`	Output directory for all annotation files.
`--prefix`	Prefix used for output file names (e.g. `sample`).
`--genus` / `--species`	Organism taxonomy used for annotation and output metadata.
`--cpus`	Number of CPU threads to use.
`--kingdom`	Annotation kingdom: `Bacteria` (default), `Archaea`, or `Viruses`.
`--locustag`	Locus tag prefix for gene identifiers.
`--compliant`	Force GenBank/ENA/DDBJ compliance in output files.
`--rfam`	Enable searching for ncRNAs using the Rfam database (slower but more comprehensive).
`--proteins`	Path to a trusted protein FASTA or GenBank file for first-pass annotation against a custom database.

Expected Output

Prokka writes output files to the specified directory, all sharing the prefix given with --prefix:

sample.gff – annotations in GFF3 format with the genome sequence appended.
sample.gbk – annotations in GenBank format.
sample.faa – predicted protein sequences in FASTA format.
sample.ffn – nucleotide sequences of predicted features in FASTA format.
sample.fna – input genome sequence (may be re-named).
sample.tsv – a tab-separated summary of all annotated features.
sample.txt – a plain-text statistics summary with counts of each feature type.
sample.log – the Prokka log file with runtime details.