BCFtools
Overview
BCFtools is a high-performance toolkit for manipulating variant calls stored in VCF and BCF format. It provides sub-commands for filtering, merging, splitting, subsetting, normalising, and computing statistics on variant files. BCFtools uses streaming and indexing strategies that allow it to process large whole-genome VCFs efficiently with low memory usage. It is the variant-file counterpart to SAMtools and shares the same HTSlib backend for reading and writing compressed, indexed genomic data files.
Installation
mamba install -c bioconda bcftools
Basic Usage
Quality filtering
Apply minimum quality and depth filters to a VCF file.
bcftools filter -i 'QUAL>=20 && INFO/DP>=10' \
clair3_output/merge_output.vcf.gz \
-Oz -o filtered.vcf.gz
Separate SNPs and indels
bcftools view -v snps filtered.vcf.gz -Oz -o snps.vcf.gz
bcftools view -v indels filtered.vcf.gz -Oz -o indels.vcf.gz
Generate statistics
bcftools stats filtered.vcf.gz > variant_stats.txt
bcftools stats snps.vcf.gz > snp_stats.txt
Key Parameters
Flag / option |
Description |
|---|---|
|
Include only records matching the expression (e.g.
|
|
Exclude records matching the expression. |
|
Select variant types: |
|
Subset the VCF to specific samples. |
|
Restrict output to a genomic region (e.g. |
|
Write output as bgzip-compressed VCF. |
|
Write output as BCF (binary VCF). |
|
Output file path. |
|
Compute per-sample and per-site variant statistics. |
|
Merge multiple VCF/BCF files from non-overlapping sample sets. |
|
Left-align and normalise indels; split multi-allelic sites. |
|
Extract specific fields in a custom output format. |
Expected Output
filtered.vcf.gz– a compressed VCF containing only variants that pass the specified filter criteria.snps.vcf.gz/indels.vcf.gz– VCF files separated by variant type.variant_stats.txt– a comprehensive text report including counts of SNPs, indels, MNPs, transitions/transversions ratio (Ts/Tv), per-sample statistics, indel length distributions, and quality score distributions. This file can be visualised withplot-vcfstats:
plot-vcfstats variant_stats.txt -p stats_plots/
See Also
GATK – GATK variant calling and VQSR filtering upstream of BCFtools processing
Clair3 – long-read variant caller whose output can be filtered with BCFtools
VEP (Variant Effect Predictor) – annotate filtered variants with functional consequences
SAMtools – companion toolkit for BAM/CRAM file manipulation