BigWig / bedGraph
Overview
BigWig and bedGraph are formats for storing continuous numerical data across genomic coordinates. They are used to represent:
Read coverage (depth of sequencing across the genome)
Signal tracks (ChIP-seq enrichment, ATAC-seq accessibility)
Normalized scores (RPKM, CPM, log2 fold-change)
Conservation scores (phastCons, phyloP)
GC content and other per-base statistics
These formats are essential for genome browser visualisation (IGV, UCSC Genome Browser, JBrowse) and for computing signal matrices around genomic features (e.g. heatmaps of ChIP-seq signal at promoters).
Format |
Extension |
Description |
|---|---|---|
bedGraph |
|
Human-readable, tab-delimited text. Four columns: chrom, start, end, value. Uses 0-based, half-open coordinates (same as BED). |
BigWig |
|
Compressed binary version of bedGraph. Indexed for fast random access. The preferred format for large datasets and genome browsers. |
Structure
bedGraph format
bedGraph is a simple four-column tab-delimited format:
chr1 0 1000 0.0
chr1 1000 1050 3.5
chr1 1050 1200 12.8
chr1 1200 1350 8.2
chr1 1350 2000 0.0
Each line defines a genomic interval and its associated value. Intervals are non-overlapping and typically consecutive. Regions with a value of zero may be omitted to save space.
Col |
Field |
Description |
|---|---|---|
1 |
chrom |
Chromosome name. |
2 |
chromStart |
Start position (0-based, inclusive). |
3 |
chromEnd |
End position (exclusive). |
4 |
value |
Signal value (integer or float). |
BigWig format
BigWig is a binary, indexed format that stores the same data as bedGraph but in a compressed R-tree structure. It cannot be read as plain text but supports efficient random access for any genomic region. BigWig files are typically 5–20x smaller than the equivalent bedGraph.
The internal structure consists of:
Header – magic number, version, chromosome list.
Zoom levels – pre-computed summaries at multiple resolutions for fast rendering at different scales.
Data blocks – compressed intervals with values, organised in an R-tree index for fast regional queries.
Working With
Generating coverage tracks from BAM
The most common way to create BigWig files is with bamCoverage from
deepTools:
# Basic coverage track
bamCoverage -b aligned.sorted.bam -o coverage.bw \
--binSize 10 --normalizeUsing RPKM -p 8
# CPM-normalised track (reads per million)
bamCoverage -b aligned.sorted.bam -o coverage_cpm.bw \
--binSize 10 --normalizeUsing CPM -p 8
# Extend reads to estimated fragment size (ChIP-seq)
bamCoverage -b chip.sorted.bam -o chip.bw \
--binSize 10 --extendReads 200 --normalizeUsing RPKM -p 8
Comparing signal between samples
# Log2 ratio of ChIP over Input
bamCompare -b1 chip.sorted.bam -b2 input.sorted.bam \
-o log2ratio.bw --binSize 50 -p 8
# Subtract input from ChIP
bamCompare -b1 chip.sorted.bam -b2 input.sorted.bam \
-o subtracted.bw --ratio subtract --binSize 50 -p 8
Computing signal matrices and heatmaps
# Compute matrix of signal around TSS
computeMatrix reference-point \
-S chip.bw input.bw \
-R genes.bed \
--referencePoint TSS \
-a 3000 -b 3000 \
-o matrix.gz -p 8
# Plot as a heatmap
plotHeatmap -m matrix.gz -o heatmap.png \
--colorMap RdBu_r --whatToShow 'heatmap and colorbar'
# Plot as a profile
plotProfile -m matrix.gz -o profile.png
Converting between formats
# bedGraph to BigWig (requires chromosome sizes file)
bedGraphToBigWig coverage.bedGraph chrom.sizes coverage.bw
# BigWig to bedGraph
bigWigToBedGraph coverage.bw coverage.bedGraph
# Generate chrom.sizes from a FASTA index
cut -f1,2 reference.fa.fai > chrom.sizes
Generating bedGraph from BAM
# Genome-wide coverage as bedGraph
bedtools genomecov -ibam aligned.sorted.bam -bg > coverage.bedGraph
# Sort the bedGraph (required before conversion to BigWig)
sort -k1,1 -k2,2n coverage.bedGraph > coverage.sorted.bedGraph
Extracting values from BigWig
# Extract signal values for specific regions
bigWigAverageOverBed coverage.bw regions.bed output.tab
# Get signal summary (mean, min, max) for a region
bigWigSummary coverage.bw chr1 10000 20000 10
Viewing in genome browsers
BigWig files can be loaded directly into:
IGV – drag and drop the
.bwfile or useFile > Load from File.UCSC Genome Browser – host the file on a web server and add a custom track line.
JBrowse – add as a BigWig track in the configuration.
See Also
deepTools –
bamCoverage,bamCompare,computeMatrix, andplotHeatmapMACS2 – peak caller that produces bedGraph signal tracks
IGV – genome browser for viewing BigWig files
BED – the interval format that bedGraph extends
SAM / BAM / CRAM – the alignment format from which coverage is computed