NanoPlot

Overview

NanoPlot is a plotting and statistics tool designed for long-read sequencing data from Oxford Nanopore and PacBio platforms. It generates publication-ready figures showing read length distributions, quality score distributions, read-length vs quality scatter plots, and yield-over-time plots. NanoPlot accepts multiple input formats including FASTQ, BAM, and the sequencing_summary.txt file produced by the Oxford Nanopore basecaller.

Installation

mamba install -c bioconda nanoplot

Basic Usage

NanoPlot supports several input formats. Choose the one that best matches your data.

# From BAM file (recommended for nanopore)
NanoPlot --bam reads.bam -o nanoplot_output/ --plots dot kde

# From FASTQ
NanoPlot --fastq reads.fastq.gz -o nanoplot_output/ --N50 --loglength

# From sequencing_summary.txt
NanoPlot --summary sequencing_summary.txt -o nanoplot_output/

Key Parameters

Flag / option	Description
`--bam`	Input BAM file(s) for analysis.
`--fastq`	Input FASTQ file(s) for analysis.
`--summary`	Input sequencing_summary.txt from the Oxford Nanopore basecaller.
`-o`	Output directory for plots and statistics.
`--plots`	Plot types to generate (e.g., `dot`, `kde`, `hex`).
`--N50`	Show the N50 mark in the read length histogram.
`--loglength`	Use a log-transformed x-axis for read length plots.
`--maxlength`	Filter out reads longer than this value.
`--minlength`	Filter out reads shorter than this value.
`--minqual`	Filter out reads with average quality below this value.
`-t` / `--threads`	Number of threads for BAM file reading.
`--title`	Custom title for the report.

Expected Output

NanoPlot creates several files in the output directory:

NanoPlot-report.html – a self-contained HTML report with all figures and summary statistics.
NanoStats.txt – a text file with summary statistics including mean read length, median read length, N50, mean quality, total bases, and number of reads.
A set of PNG and, optionally, SVG plot files:
- LengthvsQualityScatterPlot – scatter plot of read length vs average quality.
- Non_weightedHistogramReadlength – histogram of read lengths.
- WeightedHistogramReadlength – histogram of read lengths weighted by number of bases.
- Non_weightedLogTransformed_HistogramReadlength – log-scaled length histogram (when --loglength is used).
- Yield_By_Length – cumulative yield as a function of read length.