Prokka
Overview
Prokka is a widely used command-line tool for rapid annotation of prokaryotic genomes. It coordinates several external feature-prediction tools – including Prodigal for coding sequences, Aragorn for tRNAs, RNAmmer for rRNAs, and SignalP for signal peptides – and produces standardised output files suitable for database submission and downstream analysis. Prokka is fast, typically annotating a bacterial genome in under ten minutes, and requires no external database download beyond its bundled reference data.
Installation
mamba install -c bioconda prokka
Basic Usage
Annotate a consensus assembly, specifying the output directory, file prefix, and organism taxonomy.
prokka consensus.fasta \
--outdir prokka_output/ \
--prefix sample \
--genus Escherichia --species "coli" \
--cpus 8
Key Parameters
Flag / option |
Description |
|---|---|
(positional) |
Input genome assembly in FASTA format. |
|
Output directory for all annotation files. |
|
Prefix used for output file names (e.g. |
|
Organism taxonomy used for annotation and output metadata. |
|
Number of CPU threads to use. |
|
Annotation kingdom: |
|
Locus tag prefix for gene identifiers. |
|
Force GenBank/ENA/DDBJ compliance in output files. |
|
Enable searching for ncRNAs using the Rfam database (slower but more comprehensive). |
|
Path to a trusted protein FASTA or GenBank file for first-pass annotation against a custom database. |
Expected Output
Prokka writes output files to the specified directory, all sharing the
prefix given with --prefix:
sample.gff– annotations in GFF3 format with the genome sequence appended.sample.gbk– annotations in GenBank format.sample.faa– predicted protein sequences in FASTA format.sample.ffn– nucleotide sequences of predicted features in FASTA format.sample.fna– input genome sequence (may be re-named).sample.tsv– a tab-separated summary of all annotated features.sample.txt– a plain-text statistics summary with counts of each feature type.sample.log– the Prokka log file with runtime details.