Nanopore Variant Calling

Overview

Oxford Nanopore long reads can be used for germline variant calling when aligned to a reference genome and processed with a deep-learning-based variant caller. This pipeline maps reads with minimap2, calls SNPs and indels with Clair3 (a pileup-and-full-alignment neural-network caller optimized for Nanopore data), and filters variants with bcftools. Clair3 leverages platform-specific models trained on Nanopore error profiles, achieving high accuracy on both SNPs and small indels from long-read data.

Pipeline Steps

  1. minimap2 align – Map Nanopore reads to the reference genome using minimap2 in map-ont mode with MD tags for accurate variant calling. Output is piped through samtools sort to produce a coordinate-sorted BAM.

  2. samtools index – Index the sorted BAM file for random access by downstream tools.

  3. Clair3 – Call germline variants (SNPs and indels) using the Clair3 deep-learning model appropriate for the Nanopore chemistry. The --include_all_ctgs flag processes all contigs in the reference.

  4. bcftools filter – Apply quality and depth filters to remove low- confidence variant calls (QUAL >= 20, DP >= 10).

  5. bcftools stats – Generate summary statistics for the filtered variant set including counts of SNPs, indels, and ts/tv ratio.

Implementation

# Snakefile -- Nanopore Variant Calling Pipeline
configfile: "config.yaml"

SAMPLES   = config["samples"]
REFERENCE = config["reference"]
MODEL     = config["clair3_model"]

rule all:
    input:
        expand("results/variants/{sample}_snps.vcf.gz", sample=SAMPLES),
        expand("results/variants/{sample}_stats.txt", sample=SAMPLES)

rule minimap2_align:
    input:
        reads="data/{sample}.fastq.gz",
        ref=REFERENCE
    output:
        bam="results/aligned/{sample}.sorted.bam",
        bai="results/aligned/{sample}.sorted.bam.bai"
    threads: 8
    shell:
        """
        minimap2 -ax map-ont --MD \
          -R '@RG\\tID:{wildcards.sample}\\tSM:{wildcards.sample}\\tPL:ONT' \
          {input.ref} {input.reads} \
          | samtools sort -@ {threads} -o {output.bam} -
        samtools index {output.bam}
        """

rule clair3:
    input:
        bam="results/aligned/{sample}.sorted.bam",
        bai="results/aligned/{sample}.sorted.bam.bai",
        ref=REFERENCE
    output:
        vcf="results/clair3/{sample}/merge_output.vcf.gz"
    params:
        model=MODEL,
        outdir="results/clair3/{sample}"
    threads: 8
    shell:
        """
        run_clair3.sh \
          --bam_fn={input.bam} \
          --ref_fn={input.ref} \
          --output={params.outdir} \
          --threads={threads} \
          --platform="ont" \
          --model_path={params.model} \
          --sample_name={wildcards.sample} \
          --include_all_ctgs
        """

rule bcftools_filter:
    input:
        "results/clair3/{sample}/merge_output.vcf.gz"
    output:
        filtered="results/variants/{sample}_filtered.vcf.gz",
        snps="results/variants/{sample}_snps.vcf.gz",
        stats="results/variants/{sample}_stats.txt"
    shell:
        """
        bcftools filter -i 'QUAL>=20 && INFO/DP>=10' \
          {input} -Oz -o {output.filtered}
        bcftools view -v snps {output.filtered} -Oz -o {output.snps}
        bcftools stats {output.filtered} > {output.stats}
        """

Expected Output

After a successful run the output directory will contain:

  • results/aligned/<sample>.sorted.bam – Coordinate-sorted BAM file with read group tags and MD tags for variant calling.

  • results/clair3/<sample>/merge_output.vcf.gz – Raw Clair3 variant calls including SNPs and indels with quality scores and genotype likelihoods.

  • results/variants/<sample>_filtered.vcf.gz – Quality-filtered variants (QUAL >= 20, DP >= 10).

  • results/variants/<sample>_snps.vcf.gz – SNP-only subset of filtered variants.

  • results/variants/<sample>_stats.txt – Summary statistics including total variants, SNP count, indel count, and transition/transversion ratio.

See Also