Clair3
Overview
Clair3 is a deep-learning-based variant caller designed specifically for long-read sequencing data from Oxford Nanopore Technologies (ONT) and PacBio platforms. It uses a pileup-based neural network followed by a full-alignment model to call germline SNPs and indels with high accuracy. Clair3 ships with pre-trained models for various sequencing platforms and chemistries, making it straightforward to deploy on both Nanopore simplex and duplex data as well as PacBio HiFi reads. It supports multi-threaded execution and can process whole-genome datasets efficiently.
Installation
mamba create -n clair3 -c bioconda clair3
Activate the environment before running:
conda activate clair3
Basic Usage
Call variants from ONT reads aligned to a reference genome.
run_clair3.sh \
--bam_fn=aligned.sorted.bam \
--ref_fn=reference.fna \
--output=clair3_output/ \
--threads=8 \
--platform="ont" \
--model_path="${CONDA_PREFIX}/bin/models/ont" \
--sample_name=sample1 \
--include_all_ctgs
Key Parameters
Flag / option |
Description |
|---|---|
|
Path to the sorted and indexed BAM file. |
|
Path to the reference FASTA file (must be indexed). |
|
Output directory for variant calls and intermediate files. |
|
Number of CPU threads to use. |
|
Sequencing platform: |
|
Path to the pre-trained model directory matching the sequencing platform and chemistry. |
|
Sample name to embed in the VCF header. |
|
Call variants on all contigs, not just those matching |
|
Restrict variant calling to regions defined in a BED file. |
|
Minimum variant quality score threshold (default 2). |
Expected Output
Clair3 writes its results to the specified output directory:
merge_output.vcf.gz– the final merged VCF containing all called SNPs and indels with quality scores and genotype information.merge_output.vcf.gz.tbi– tabix index for the merged VCF.pileup.vcf.gz– intermediate pileup model calls before full-alignment refinement.full_alignment.vcf.gz– calls from the full-alignment model for candidate variants.log/– directory containing run logs and timing information.
See Also
GATK – GATK HaplotypeCaller for short-read germline variant calling
DeepVariant – another deep-learning variant caller supporting multiple sequencing platforms
BCFtools – filter and process the VCF output from Clair3