Dorado
Overview
Dorado is Oxford Nanopore Technologies’ (ONT) current basecaller for converting raw electrical signal data into nucleotide sequences. It replaces the legacy Guppy basecaller and provides improved accuracy through updated neural network architectures. Dorado natively reads POD5 files (ONT’s current raw data format) and outputs aligned or unaligned BAM files with per-read quality scores and move tables. It supports multiple accuracy tiers (fast, high-accuracy, super-accuracy) and can perform modified base detection (e.g., 5mC, 6mA methylation) during basecalling.
Installation
Download the latest Dorado release from the Oxford Nanopore Technologies GitHub. Pre-compiled binaries are available for Linux (x86_64, aarch64) and macOS.
# Example for Linux x86_64
wget https://github.com/nanoporetech/dorado/releases/latest/download/dorado-<version>-linux-x64.tar.gz
tar -xzf dorado-<version>-linux-x64.tar.gz
export PATH=$PWD/dorado-<version>-linux-x64/bin:$PATH
Dorado requires a CUDA-capable GPU for optimal performance, though CPU-only execution is supported.
Basic Usage
Download a basecalling model and run basecalling on a directory of POD5 files.
# Download basecalling model
dorado download --model dna_r10.4.1_e8.2_400bps_sup@v5.0.0
# Basecall POD5 files
dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v5.0.0 \
pod5_dir/ \
--device cuda:0 \
> calls.bam
# Convert to FASTQ if needed
samtools fastq calls.bam | gzip > reads.fastq.gz
For modified base calling (e.g., 5mC CpG methylation):
dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v5.0.0 \
pod5_dir/ \
--device cuda:0 \
--modified-bases 5mCG_5hmCG \
> calls_modbase.bam
Key Parameters
Flag / option |
Description |
|---|---|
|
Basecalling model to download or use (e.g.,
|
|
Compute device ( |
|
Detect modified bases during basecalling (e.g., |
|
Include move table information in the output BAM (useful for signal-level analysis). |
|
Minimum quality score threshold; reads below this are filtered out. |
|
Recursively search input directories for POD5 files. |
|
Number of reads per batch (affects GPU memory usage). |
|
Reference FASTA for direct alignment during basecalling (outputs aligned BAM). |
Expected Output
calls.bam– unaligned BAM file containing basecalled reads with quality scores, read group information, and optional modified base tags. Each record corresponds to one nanopore read.reads.fastq.gz– FASTQ file converted from the BAM output (if the conversion step is run).
When --reference is provided, the output is an aligned, sorted BAM file
that can be indexed directly with samtools index.