Medaka
Overview
Medaka is a neural-network-based polishing tool from Oxford Nanopore
Technologies that improves the consensus accuracy of draft assemblies produced
from Nanopore reads. It aligns the original reads back to the draft assembly
and applies a recurrent neural network to predict a more accurate consensus
sequence. Medaka provides pre-trained models matched to specific basecalling
configurations (chemistry, pore type, and basecaller version), and its
medaka_polisher pipeline wraps alignment, inference, and consensus
generation into a single command.
Installation
mamba install -c bioconda medaka
Basic Usage
Polish a Flye assembly using the reads that produced it, selecting a model that matches the basecalling configuration.
medaka_polisher -i filtered_reads.fastq.gz \
-d flye_output/assembly.fasta \
-o medaka_output/ \
-m r1041_e82_400bps_sup_v5.0.0 \
--bacteria \
--threads 8
Key Parameters
Flag / option |
Description |
|---|---|
|
Input reads in FASTQ or FASTA format (the same reads used for assembly). |
|
Draft assembly to be polished (FASTA format). |
|
Output directory for the polished consensus. |
|
Medaka model name matching the basecalling configuration (e.g.
|
|
Apply settings optimised for bacterial genomes. |
|
Number of CPU threads to use for alignment and inference. |
|
Inference batch size (higher values use more GPU memory but run faster). |
Expected Output
Medaka writes its results to the specified output directory:
consensus.fasta– the polished consensus assembly in FASTA format. This is the primary output for downstream analysis.Intermediate BAM alignment files used during the polishing process.
Log files with details of model inference and consensus calling.
The polished consensus.fasta typically shows measurable improvements in
identity when compared to a reference, particularly at homopolymer sites where
Nanopore reads are most error-prone.