BCFtools

Overview

BCFtools is a high-performance toolkit for manipulating variant calls stored in VCF and BCF format. It provides sub-commands for filtering, merging, splitting, subsetting, normalising, and computing statistics on variant files. BCFtools uses streaming and indexing strategies that allow it to process large whole-genome VCFs efficiently with low memory usage. It is the variant-file counterpart to SAMtools and shares the same HTSlib backend for reading and writing compressed, indexed genomic data files.

Installation

mamba install -c bioconda bcftools

Basic Usage

Quality filtering

Apply minimum quality and depth filters to a VCF file.

bcftools filter -i 'QUAL>=20 && INFO/DP>=10' \
  clair3_output/merge_output.vcf.gz \
  -Oz -o filtered.vcf.gz

Separate SNPs and indels

bcftools view -v snps filtered.vcf.gz -Oz -o snps.vcf.gz
bcftools view -v indels filtered.vcf.gz -Oz -o indels.vcf.gz

Generate statistics

bcftools stats filtered.vcf.gz > variant_stats.txt
bcftools stats snps.vcf.gz > snp_stats.txt

Key Parameters

Flag / option

Description

filter -i EXPR

Include only records matching the expression (e.g. 'QUAL>=20 && INFO/DP>=10').

filter -e EXPR

Exclude records matching the expression.

view -v TYPE

Select variant types: snps, indels, mnps, or other.

view -s SAMPLE

Subset the VCF to specific samples.

view -r REGION

Restrict output to a genomic region (e.g. chr1:1000-2000).

-Oz

Write output as bgzip-compressed VCF.

-Ob

Write output as BCF (binary VCF).

-o FILE

Output file path.

stats

Compute per-sample and per-site variant statistics.

merge

Merge multiple VCF/BCF files from non-overlapping sample sets.

norm

Left-align and normalise indels; split multi-allelic sites.

query -f FORMAT

Extract specific fields in a custom output format.

Expected Output

  • filtered.vcf.gz – a compressed VCF containing only variants that pass the specified filter criteria.

  • snps.vcf.gz / indels.vcf.gz – VCF files separated by variant type.

  • variant_stats.txt – a comprehensive text report including counts of SNPs, indels, MNPs, transitions/transversions ratio (Ts/Tv), per-sample statistics, indel length distributions, and quality score distributions. This file can be visualised with plot-vcfstats:

plot-vcfstats variant_stats.txt -p stats_plots/

See Also

  • GATK – GATK variant calling and VQSR filtering upstream of BCFtools processing

  • Clair3 – long-read variant caller whose output can be filtered with BCFtools

  • VEP (Variant Effect Predictor) – annotate filtered variants with functional consequences

  • SAMtools – companion toolkit for BAM/CRAM file manipulation