Prokka

Overview

Prokka is a widely used command-line tool for rapid annotation of prokaryotic genomes. It coordinates several external feature-prediction tools – including Prodigal for coding sequences, Aragorn for tRNAs, RNAmmer for rRNAs, and SignalP for signal peptides – and produces standardised output files suitable for database submission and downstream analysis. Prokka is fast, typically annotating a bacterial genome in under ten minutes, and requires no external database download beyond its bundled reference data.

Installation

mamba install -c bioconda prokka

Basic Usage

Annotate a consensus assembly, specifying the output directory, file prefix, and organism taxonomy.

prokka consensus.fasta \
  --outdir prokka_output/ \
  --prefix sample \
  --genus Escherichia --species "coli" \
  --cpus 8

Key Parameters

Flag / option

Description

(positional)

Input genome assembly in FASTA format.

--outdir

Output directory for all annotation files.

--prefix

Prefix used for output file names (e.g. sample).

--genus / --species

Organism taxonomy used for annotation and output metadata.

--cpus

Number of CPU threads to use.

--kingdom

Annotation kingdom: Bacteria (default), Archaea, or Viruses.

--locustag

Locus tag prefix for gene identifiers.

--compliant

Force GenBank/ENA/DDBJ compliance in output files.

--rfam

Enable searching for ncRNAs using the Rfam database (slower but more comprehensive).

--proteins

Path to a trusted protein FASTA or GenBank file for first-pass annotation against a custom database.

Expected Output

Prokka writes output files to the specified directory, all sharing the prefix given with --prefix:

  • sample.gff – annotations in GFF3 format with the genome sequence appended.

  • sample.gbk – annotations in GenBank format.

  • sample.faa – predicted protein sequences in FASTA format.

  • sample.ffn – nucleotide sequences of predicted features in FASTA format.

  • sample.fna – input genome sequence (may be re-named).

  • sample.tsv – a tab-separated summary of all annotated features.

  • sample.txt – a plain-text statistics summary with counts of each feature type.

  • sample.log – the Prokka log file with runtime details.

See Also

  • Bakta – newer prokaryotic annotator with a more comprehensive and regularly updated database

  • Medaka – polishing step commonly run before annotation

  • BUSCO – assess gene-level completeness of the annotated assembly