SnpEff
Overview
SnpEff is a fast variant annotation and functional effect prediction tool that classifies genetic variants by their impact on known genes and proteins. Given a VCF file and a genome database, SnpEff reports the affected gene, transcript, exon or intron location, amino acid change, and predicted impact category (HIGH, MODERATE, LOW, or MODIFIER) for each variant. It bundles pre-built annotation databases for thousands of genomes and can annotate a whole-genome VCF in minutes. SnpEff is often used together with SnpSift, which provides utilities for filtering and extracting fields from annotated VCFs.
Installation
mamba install -c bioconda snpsift
This installs both SnpEff and SnpSift.
Basic Usage
Download the annotation database
snpEff download GRCh38.105
Annotate variants
snpEff ann GRCh38.105 variants.vcf.gz > annotated.vcf
The annotation is added to the INFO field of each VCF record as an ANN
tag containing pipe-delimited fields for allele, effect, impact, gene name,
gene ID, feature type, transcript ID, biotype, and HGVS notation.
Key Parameters
Flag / option |
Description |
|---|---|
|
Annotate variants (primary sub-command). |
|
Download a pre-built annotation database by name. |
|
Verbose output for logging and debugging. |
|
Report annotations only for canonical transcripts. |
|
Skip generation of the HTML summary statistics file. |
|
Write statistics in CSV format instead of HTML. |
|
Do not annotate downstream gene variants. |
|
Do not annotate upstream gene variants. |
|
Do not annotate intergenic variants. |
|
Path to the directory containing SnpEff databases. |
Expected Output
Standard output – an annotated VCF with
ANNfields in the INFO column describing the predicted effect of each variant on overlapping genes and transcripts. Each annotation includes: allele, annotation type (e.g. missense_variant, stop_gained), putative impact (HIGH/MODERATE/LOW/MODIFIER), gene name, gene ID, feature type, feature ID, transcript biotype, rank, HGVS.c, HGVS.p, cDNA position, CDS position, and protein position.snpEff_genes.txt– a tab-delimited gene-level summary listing the number and types of variants affecting each gene.snpEff_summary.html– an HTML report with variant type distributions, impact counts, and quality summaries.
See Also
VEP (Variant Effect Predictor) – Ensembl Variant Effect Predictor with extensive plugin support and population frequency annotations
BCFtools – filter annotated VCFs by impact or annotation fields using SnpSift or bcftools query
GATK – upstream variant calling with GATK HaplotypeCaller