Guppy ===== Overview -------- Guppy was Oxford Nanopore Technologies' (ONT) production basecaller for converting raw nanopore signal data into nucleotide sequences. It supported multiple basecalling models (fast, high-accuracy, super-accuracy) and could perform barcoding, adapter trimming, and alignment as part of the basecalling pipeline. Guppy has been superseded by Dorado, which offers improved accuracy and performance. Guppy remains relevant for reprocessing older datasets and for environments where Dorado has not yet been adopted. .. note:: Guppy is deprecated and is being replaced by Dorado. New projects should use Dorado for basecalling. See :doc:`dorado` for the recommended workflow. Installation ------------ Guppy is distributed through the ONT Community portal and is not available via Bioconda. Download the appropriate package from the `ONT Community site `_ (requires a login). After downloading, extract and add to your ``PATH``: .. code-block:: bash tar -xzf ont-guppy__linux64.tar.gz export PATH=$PWD/ont-guppy/bin:$PATH GPU-accelerated basecalling requires a CUDA-capable GPU and the corresponding CUDA toolkit. Basic Usage ----------- Basecall a directory of POD5 or FAST5 files using a super-accuracy model on GPU. .. code-block:: bash guppy_basecaller -i pod5_dir/ -s output/ \ --config dna_r10.4.1_e8.2_400bps_sup.cfg \ --device cuda:0 \ --recursive The output directory will contain FASTQ files (one per batch) and a sequencing summary file. Key Parameters -------------- .. list-table:: :header-rows: 1 :widths: 25 75 * - Flag / option - Description * - ``-i`` - Input directory containing raw signal files (POD5 or FAST5). * - ``-s`` - Output directory for basecalled FASTQ files and summary. * - ``--config`` - Basecalling configuration file (e.g., ``dna_r10.4.1_e8.2_400bps_sup.cfg``). Must match the flow cell and kit chemistry. * - ``--device`` - GPU device to use (``cuda:0``, ``cuda:0,1`` for multiple GPUs, or ``auto``). * - ``--recursive`` - Recursively search the input directory for signal files. * - ``--num_callers`` - Number of parallel basecalling processes (CPU mode). * - ``--gpu_runners_per_device`` - Number of neural network runners per GPU (tuning parameter for throughput). * - ``--chunks_per_runner`` - Number of signal chunks per runner (affects GPU memory usage). * - ``--compress_fastq`` - Compress output FASTQ files with gzip. * - ``--barcode_kits`` - Barcode kit name(s) for demultiplexing during basecalling. * - ``--min_qscore`` - Minimum quality score threshold for the pass/fail split. Expected Output --------------- Guppy creates a structured output directory: * ``output/pass/`` -- FASTQ files for reads that meet the minimum quality score threshold. * ``output/fail/`` -- FASTQ files for reads below the quality threshold. * ``output/sequencing_summary.txt`` -- tab-delimited file with per-read statistics including read ID, run ID, channel, start time, duration, sequence length, and mean quality score. * ``output/guppy_basecaller_log*.log`` -- basecalling log files. If barcoding is enabled, reads are further separated into subdirectories named by barcode (e.g., ``barcode01/``, ``barcode02/``). See Also -------- * :doc:`dorado` -- the recommended replacement for Guppy with improved basecalling accuracy * :doc:`/tools/quality-control/nanoplot` -- quality assessment of basecalling output * :doc:`/tools/quality-control/pycoqc` -- QC report from the sequencing summary file produced by Guppy