BEDTools

Overview

BEDTools is the standard toolkit for genomic interval arithmetic. It provides fast, set-theoretic operations – intersection, union, complement, merging, and more – on BED, BAM, VCF, and GFF/GTF files. BEDTools is essential for tasks such as identifying overlapping genomic features, filtering regions against blacklists, computing genome-wide coverage, and measuring similarity between interval sets with the Jaccard statistic.

Installation

mamba install -c bioconda bedtools

Basic Usage

Intersect peaks with gene promoters

bedtools intersect -a peaks.bed -b promoters.bed -wa -wb > overlap.bed

Find peaks that do NOT overlap with blacklist regions

bedtools intersect -a peaks.bed -b blacklist.bed -v > clean_peaks.bed

Merge overlapping intervals

bedtools merge -i sorted_peaks.bed -d 100 > merged.bed

Find the closest gene to each peak

bedtools closest -a peaks.bed -b genes.bed -d > closest.bed

Compute genome-wide coverage from a BAM file

bedtools genomecov -ibam sample.bam -bg > coverage.bedgraph

Generate windows across the genome

bedtools makewindows -g genome.sizes -w 10000 > windows.bed

Compute Jaccard similarity between two sets of intervals

bedtools jaccard -a set1.bed -b set2.bed

Key Parameters

Flag / option

Description

-a

The “query” file (BED/BAM/VCF/GFF).

-b

The “subject” file(s) to compare against.

-wa

Write the original entry from -a for each overlap.

-wb

Write the original entry from -b for each overlap.

-v

Report entries in -a that have no overlap with -b.

-d INT

Maximum distance between features to merge (used with merge); features within this distance are combined into a single interval.

-f FLOAT

Minimum overlap required as a fraction of -a (e.g., 0.50 for 50 %).

-r

Require the overlap fraction to be reciprocal for both -a and -b.

-s

Require that features are on the same strand.

-bg

Report depth in bedGraph format (used with genomecov).

-g

Genome file providing chromosome sizes (required by makewindows, complement, etc.).

-w INT

Window size in bp (used with makewindows).

Expected Output

  • intersect – a BED file listing features (or feature pairs with -wa -wb) that overlap between the two inputs. With -v, only non-overlapping entries from -a are reported.

  • merge – a BED file of merged intervals where overlapping or nearby features have been collapsed.

  • closest – a BED file pairing each -a feature with the nearest -b feature; the -d flag appends the distance as an extra column.

  • genomecov – a bedGraph file of per-base or per-bin coverage across the genome.

  • makewindows – a BED file of fixed-size, non-overlapping (or sliding) windows tiling the genome.

  • jaccard – a single-line summary with the intersection size, union size, and Jaccard index.

See Also

  • SAMtools – prepare sorted and indexed BAM files used as input by several BEDTools sub-commands

  • deepTools – compute normalised coverage tracks and heatmaps from BAM/bigWig files

  • BED – reference for the BED interval format

  • SAM / BAM / CRAM – reference for the SAM/BAM/CRAM file formats

  • BigWig / bedGraph – reference for the bigWig and bedGraph coverage formats

  • VCF / BCF – reference for the VCF/BCF variant format