Dorado

Overview

Dorado is Oxford Nanopore Technologies’ (ONT) current basecaller for converting raw electrical signal data into nucleotide sequences. It replaces the legacy Guppy basecaller and provides improved accuracy through updated neural network architectures. Dorado natively reads POD5 files (ONT’s current raw data format) and outputs aligned or unaligned BAM files with per-read quality scores and move tables. It supports multiple accuracy tiers (fast, high-accuracy, super-accuracy) and can perform modified base detection (e.g., 5mC, 6mA methylation) during basecalling.

Installation

Download the latest Dorado release from the Oxford Nanopore Technologies GitHub. Pre-compiled binaries are available for Linux (x86_64, aarch64) and macOS.

# Example for Linux x86_64
wget https://github.com/nanoporetech/dorado/releases/latest/download/dorado-<version>-linux-x64.tar.gz
tar -xzf dorado-<version>-linux-x64.tar.gz
export PATH=$PWD/dorado-<version>-linux-x64/bin:$PATH

Dorado requires a CUDA-capable GPU for optimal performance, though CPU-only execution is supported.

Basic Usage

Download a basecalling model and run basecalling on a directory of POD5 files.

# Download basecalling model
dorado download --model dna_r10.4.1_e8.2_400bps_sup@v5.0.0

# Basecall POD5 files
dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v5.0.0 \
  pod5_dir/ \
  --device cuda:0 \
  > calls.bam

# Convert to FASTQ if needed
samtools fastq calls.bam | gzip > reads.fastq.gz

For modified base calling (e.g., 5mC CpG methylation):

dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v5.0.0 \
  pod5_dir/ \
  --device cuda:0 \
  --modified-bases 5mCG_5hmCG \
  > calls_modbase.bam

Key Parameters

Flag / option

Description

--model

Basecalling model to download or use (e.g., dna_r10.4.1_e8.2_400bps_sup@v5.0.0). Must match the flow cell and kit chemistry.

--device

Compute device (cuda:0, cuda:all, or cpu).

--modified-bases

Detect modified bases during basecalling (e.g., 5mCG_5hmCG for 5-methylcytosine and 5-hydroxymethylcytosine in CpG context).

--emit-moves

Include move table information in the output BAM (useful for signal-level analysis).

--min-qscore

Minimum quality score threshold; reads below this are filtered out.

--recursive

Recursively search input directories for POD5 files.

--batchsize

Number of reads per batch (affects GPU memory usage).

--reference

Reference FASTA for direct alignment during basecalling (outputs aligned BAM).

Expected Output

  • calls.bam – unaligned BAM file containing basecalled reads with quality scores, read group information, and optional modified base tags. Each record corresponds to one nanopore read.

  • reads.fastq.gz – FASTQ file converted from the BAM output (if the conversion step is run).

When --reference is provided, the output is an aligned, sorted BAM file that can be indexed directly with samtools index.

See Also

  • Guppy – legacy ONT basecaller (superseded by Dorado)

  • NanoPlot – quality assessment of nanopore basecalling output

  • Chopper – length and quality filtering of nanopore reads

  • minimap2 – long-read aligner for mapping basecalled reads to a reference