Guppy

Overview

Guppy was Oxford Nanopore Technologies’ (ONT) production basecaller for converting raw nanopore signal data into nucleotide sequences. It supported multiple basecalling models (fast, high-accuracy, super-accuracy) and could perform barcoding, adapter trimming, and alignment as part of the basecalling pipeline. Guppy has been superseded by Dorado, which offers improved accuracy and performance. Guppy remains relevant for reprocessing older datasets and for environments where Dorado has not yet been adopted.

Note

Guppy is deprecated and is being replaced by Dorado. New projects should use Dorado for basecalling. See Dorado for the recommended workflow.

Installation

Guppy is distributed through the ONT Community portal and is not available via Bioconda. Download the appropriate package from the ONT Community site (requires a login).

After downloading, extract and add to your PATH:

tar -xzf ont-guppy_<version>_linux64.tar.gz
export PATH=$PWD/ont-guppy/bin:$PATH

GPU-accelerated basecalling requires a CUDA-capable GPU and the corresponding CUDA toolkit.

Basic Usage

Basecall a directory of POD5 or FAST5 files using a super-accuracy model on GPU.

guppy_basecaller -i pod5_dir/ -s output/ \
  --config dna_r10.4.1_e8.2_400bps_sup.cfg \
  --device cuda:0 \
  --recursive

The output directory will contain FASTQ files (one per batch) and a sequencing summary file.

Key Parameters

Flag / option

Description

-i

Input directory containing raw signal files (POD5 or FAST5).

-s

Output directory for basecalled FASTQ files and summary.

--config

Basecalling configuration file (e.g., dna_r10.4.1_e8.2_400bps_sup.cfg). Must match the flow cell and kit chemistry.

--device

GPU device to use (cuda:0, cuda:0,1 for multiple GPUs, or auto).

--recursive

Recursively search the input directory for signal files.

--num_callers

Number of parallel basecalling processes (CPU mode).

--gpu_runners_per_device

Number of neural network runners per GPU (tuning parameter for throughput).

--chunks_per_runner

Number of signal chunks per runner (affects GPU memory usage).

--compress_fastq

Compress output FASTQ files with gzip.

--barcode_kits

Barcode kit name(s) for demultiplexing during basecalling.

--min_qscore

Minimum quality score threshold for the pass/fail split.

Expected Output

Guppy creates a structured output directory:

  • output/pass/ – FASTQ files for reads that meet the minimum quality score threshold.

  • output/fail/ – FASTQ files for reads below the quality threshold.

  • output/sequencing_summary.txt – tab-delimited file with per-read statistics including read ID, run ID, channel, start time, duration, sequence length, and mean quality score.

  • output/guppy_basecaller_log*.log – basecalling log files.

If barcoding is enabled, reads are further separated into subdirectories named by barcode (e.g., barcode01/, barcode02/).

See Also

  • Dorado – the recommended replacement for Guppy with improved basecalling accuracy

  • NanoPlot – quality assessment of basecalling output

  • pycoQC – QC report from the sequencing summary file produced by Guppy