BED
Overview
BED (Browser Extensible Data) is a tab-delimited format for describing genomic intervals. It is used throughout bioinformatics to represent regions of interest such as:
Genes, exons, and promoter regions
ChIP-seq peaks and regulatory elements
Target capture regions for exome sequencing
Blacklisted or masked regions
Structural variant breakpoints
BED files are compact, human-readable, and supported by nearly every genomic
tool. They are the primary input for bedtools and are accepted by IGV,
UCSC Genome Browser, samtools, GATK, and many other programs.
Important
BED uses 0-based, half-open coordinates. The start position is
zero-based (the first base of a chromosome is 0) and the end position is
exclusive. An interval covering the first 1 000 bases of chr1 is written
as chr1 0 1000. This differs from GFF/GTF and VCF, which use
1-based, fully-closed coordinates.
Structure
BED files come in several levels of detail. The first three columns are mandatory; additional columns add information.
BED3 – minimal interval
# BED3: chromosome, start, end
chr1 1000 2000
chr1 3000 4500
Three columns: chrom, chromStart (0-based, inclusive), and
chromEnd (exclusive). The interval chr1 1000 2000 covers bases
1000 through 1999 (1 000 bases total).
BED6 – adding name, score, and strand
# BED6: adds name, score, strand
chr1 1000 2000 peak1 500 +
chr1 3000 4500 peak2 300 -
Col |
Field |
Description |
|---|---|---|
4 |
name |
Feature name or identifier. |
5 |
score |
Integer 0–1000 (e.g. peak enrichment score). |
6 |
strand |
|
BED12 – transcript models
# BED12: adds thickStart, thickEnd, itemRgb, blockCount,
# blockSizes, blockStarts (used for transcript models)
chr1 11873 14409 NR_046018 0 + 11873 11873 0 3 354,109,1189, 0,739,1347,
Col |
Field |
Description |
|---|---|---|
7 |
thickStart |
Start of the thick drawing region (e.g. CDS start). |
8 |
thickEnd |
End of the thick drawing region (e.g. CDS end). |
9 |
itemRgb |
RGB colour value for display ( |
10 |
blockCount |
Number of blocks (exons). |
11 |
blockSizes |
Comma-separated list of block sizes. |
12 |
blockStarts |
Comma-separated list of block start positions relative to chromStart. |
In the example above the transcript NR_046018 has 3 exons of sizes
354, 109, and 1189 bp. The exon start positions relative to the transcript
start (11873) are 0, 739, and 1347.
Coordinate system comparison
Format |
System |
First base |
Interval [1, 100] |
|---|---|---|---|
BED |
0-based, half-open |
0 |
|
GFF / GTF |
1-based, closed |
1 |
|
VCF |
1-based |
1 |
|
SAM |
1-based |
1 |
|
Working With
Basic bedtools operations
# Find overlapping intervals between two BED files
bedtools intersect -a peaks.bed -b promoters.bed > overlap.bed
# Subtract one set of intervals from another
bedtools subtract -a regions.bed -b blacklist.bed > clean.bed
# Merge overlapping intervals
sort -k1,1 -k2,2n regions.bed | bedtools merge > merged.bed
# Compute the complement (gaps) relative to a genome
bedtools complement -i regions.bed -g chrom.sizes > gaps.bed
Extracting sequences
# Get FASTA sequences for BED intervals
bedtools getfasta -fi reference.fa -bed regions.bed -fo regions.fa
Coverage and depth
# Count reads overlapping each interval
bedtools coverage -a targets.bed -b aligned.sorted.bam > coverage.bed
# Compute genome-wide coverage histogram
bedtools genomecov -ibam aligned.sorted.bam -bg > coverage.bedgraph
Sorting BED files
# Sort by chromosome and start position
sort -k1,1 -k2,2n unsorted.bed > sorted.bed
Slop (extend) intervals
# Extend each interval by 500 bp on both sides
bedtools slop -i peaks.bed -g chrom.sizes -b 500 > extended.bed
# Extend 2 kb upstream only (strand-aware)
bedtools slop -i genes.bed -g chrom.sizes -l 2000 -r 0 -s > promoters.bed
Filtering BAM by BED regions
# Keep only reads overlapping target regions
samtools view -b -L targets.bed aligned.sorted.bam > on_target.bam
See Also
BEDTools – the comprehensive BED manipulation toolkit
GFF / GTF – 1-based annotation format (related but different coordinate system)
BigWig / bedGraph – continuous signal format derived from BED-like intervals
VCF / BCF – variant format that can be filtered using BED regions
SAMtools –
samtools view -Lfor region-based BAM filtering