SnpEff

Overview

SnpEff is a fast variant annotation and functional effect prediction tool that classifies genetic variants by their impact on known genes and proteins. Given a VCF file and a genome database, SnpEff reports the affected gene, transcript, exon or intron location, amino acid change, and predicted impact category (HIGH, MODERATE, LOW, or MODIFIER) for each variant. It bundles pre-built annotation databases for thousands of genomes and can annotate a whole-genome VCF in minutes. SnpEff is often used together with SnpSift, which provides utilities for filtering and extracting fields from annotated VCFs.

Installation

mamba install -c bioconda snpsift

This installs both SnpEff and SnpSift.

Basic Usage

Download the annotation database

snpEff download GRCh38.105

Annotate variants

snpEff ann GRCh38.105 variants.vcf.gz > annotated.vcf

The annotation is added to the INFO field of each VCF record as an ANN tag containing pipe-delimited fields for allele, effect, impact, gene name, gene ID, feature type, transcript ID, biotype, and HGVS notation.

Key Parameters

Flag / option

Description

ann

Annotate variants (primary sub-command).

download

Download a pre-built annotation database by name.

-v

Verbose output for logging and debugging.

-canon

Report annotations only for canonical transcripts.

-noStats

Skip generation of the HTML summary statistics file.

-csvStats

Write statistics in CSV format instead of HTML.

-no-downstream

Do not annotate downstream gene variants.

-no-upstream

Do not annotate upstream gene variants.

-no-intergenic

Do not annotate intergenic variants.

-dataDir

Path to the directory containing SnpEff databases.

Expected Output

  • Standard output – an annotated VCF with ANN fields in the INFO column describing the predicted effect of each variant on overlapping genes and transcripts. Each annotation includes: allele, annotation type (e.g. missense_variant, stop_gained), putative impact (HIGH/MODERATE/LOW/MODIFIER), gene name, gene ID, feature type, feature ID, transcript biotype, rank, HGVS.c, HGVS.p, cDNA position, CDS position, and protein position.

  • snpEff_genes.txt – a tab-delimited gene-level summary listing the number and types of variants affecting each gene.

  • snpEff_summary.html – an HTML report with variant type distributions, impact counts, and quality summaries.

See Also

  • VEP (Variant Effect Predictor) – Ensembl Variant Effect Predictor with extensive plugin support and population frequency annotations

  • BCFtools – filter annotated VCFs by impact or annotation fields using SnpSift or bcftools query

  • GATK – upstream variant calling with GATK HaplotypeCaller