pycoQC

Overview

pycoQC is a quality control tool specifically designed for Oxford Nanopore sequencing data. It parses the sequencing_summary.txt file generated by the basecaller and produces an interactive HTML report with plots covering run throughput, read length distributions, quality scores, channel activity over time, and read-length vs quality correlations. pycoQC provides a comprehensive run-level overview without needing to process the reads themselves.

Installation

mamba install -c bioconda pycoqc

Basic Usage

Generate an HTML and JSON report from a basecaller sequencing summary file.

pycoQC --summary_file sequencing_summary.txt \
  --html_outfile pycoqc_report.html \
  --json_outfile pycoqc_report.json

Key Parameters

Flag / option	Description
`--summary_file`	Path to the `sequencing_summary.txt` file from the basecaller (required).
`--html_outfile`	Path for the output interactive HTML report.
`--json_outfile`	Path for the output JSON report with all computed metrics.
`--barcode_file`	Path to the `barcoding_summary.txt` file to include per-barcode statistics.
`--min_pass_qual`	Minimum quality score to classify a read as “pass” (default 7).
`--min_pass_len`	Minimum read length to classify a read as “pass” (default 0).
`--filter_calibration`	Exclude calibration strand reads from the report.
`--sample`	Randomly subsample reads to speed up report generation for very large runs.

Expected Output

pycoQC generates the following output files:

pycoqc_report.html – an interactive HTML report containing:
- Run summary – total reads, total bases, N50, median read length, and median quality.
- Throughput over time – cumulative read and base yield.
- Read length distribution – histogram and cumulative plot.
- Quality score distribution – per-read mean quality histogram.
- Read length vs quality – 2D density plot.
- Channel activity – output per channel over time as a heatmap.
- Barcode counts – per-barcode read counts (if a barcoding summary is provided).
pycoqc_report.json – a JSON file containing all computed statistics and plot data, suitable for programmatic parsing.