pycoQC ====== Overview -------- pycoQC is a quality control tool specifically designed for Oxford Nanopore sequencing data. It parses the ``sequencing_summary.txt`` file generated by the basecaller and produces an interactive HTML report with plots covering run throughput, read length distributions, quality scores, channel activity over time, and read-length vs quality correlations. pycoQC provides a comprehensive run-level overview without needing to process the reads themselves. Installation ------------ .. code-block:: bash mamba install -c bioconda pycoqc Basic Usage ----------- Generate an HTML and JSON report from a basecaller sequencing summary file. .. code-block:: bash pycoQC --summary_file sequencing_summary.txt \ --html_outfile pycoqc_report.html \ --json_outfile pycoqc_report.json Key Parameters -------------- .. list-table:: :header-rows: 1 :widths: 30 70 * - Flag / option - Description * - ``--summary_file`` - Path to the ``sequencing_summary.txt`` file from the basecaller (required). * - ``--html_outfile`` - Path for the output interactive HTML report. * - ``--json_outfile`` - Path for the output JSON report with all computed metrics. * - ``--barcode_file`` - Path to the ``barcoding_summary.txt`` file to include per-barcode statistics. * - ``--min_pass_qual`` - Minimum quality score to classify a read as "pass" (default 7). * - ``--min_pass_len`` - Minimum read length to classify a read as "pass" (default 0). * - ``--filter_calibration`` - Exclude calibration strand reads from the report. * - ``--sample`` - Randomly subsample reads to speed up report generation for very large runs. Expected Output --------------- pycoQC generates the following output files: * ``pycoqc_report.html`` -- an interactive HTML report containing: - **Run summary** -- total reads, total bases, N50, median read length, and median quality. - **Throughput over time** -- cumulative read and base yield. - **Read length distribution** -- histogram and cumulative plot. - **Quality score distribution** -- per-read mean quality histogram. - **Read length vs quality** -- 2D density plot. - **Channel activity** -- output per channel over time as a heatmap. - **Barcode counts** -- per-barcode read counts (if a barcoding summary is provided). * ``pycoqc_report.json`` -- a JSON file containing all computed statistics and plot data, suitable for programmatic parsing. See Also -------- * :doc:`nanoplot` -- alternative long-read QC tool supporting FASTQ and BAM inputs in addition to sequencing summaries * :doc:`chopper` -- quality and length filtering for nanopore reads * :doc:`/tools/basecalling/index` -- basecalling tools that produce the sequencing summary file