pycoQC

Overview

pycoQC is a quality control tool specifically designed for Oxford Nanopore sequencing data. It parses the sequencing_summary.txt file generated by the basecaller and produces an interactive HTML report with plots covering run throughput, read length distributions, quality scores, channel activity over time, and read-length vs quality correlations. pycoQC provides a comprehensive run-level overview without needing to process the reads themselves.

Installation

mamba install -c bioconda pycoqc

Basic Usage

Generate an HTML and JSON report from a basecaller sequencing summary file.

pycoQC --summary_file sequencing_summary.txt \
  --html_outfile pycoqc_report.html \
  --json_outfile pycoqc_report.json

Key Parameters

Flag / option

Description

--summary_file

Path to the sequencing_summary.txt file from the basecaller (required).

--html_outfile

Path for the output interactive HTML report.

--json_outfile

Path for the output JSON report with all computed metrics.

--barcode_file

Path to the barcoding_summary.txt file to include per-barcode statistics.

--min_pass_qual

Minimum quality score to classify a read as “pass” (default 7).

--min_pass_len

Minimum read length to classify a read as “pass” (default 0).

--filter_calibration

Exclude calibration strand reads from the report.

--sample

Randomly subsample reads to speed up report generation for very large runs.

Expected Output

pycoQC generates the following output files:

  • pycoqc_report.html – an interactive HTML report containing:

    • Run summary – total reads, total bases, N50, median read length, and median quality.

    • Throughput over time – cumulative read and base yield.

    • Read length distribution – histogram and cumulative plot.

    • Quality score distribution – per-read mean quality histogram.

    • Read length vs quality – 2D density plot.

    • Channel activity – output per channel over time as a heatmap.

    • Barcode counts – per-barcode read counts (if a barcoding summary is provided).

  • pycoqc_report.json – a JSON file containing all computed statistics and plot data, suitable for programmatic parsing.

See Also

  • NanoPlot – alternative long-read QC tool supporting FASTQ and BAM inputs in addition to sequencing summaries

  • Chopper – quality and length filtering for nanopore reads

  • Basecalling – basecalling tools that produce the sequencing summary file