BigWig / bedGraph

Overview

BigWig and bedGraph are formats for storing continuous numerical data across genomic coordinates. They are used to represent:

  • Read coverage (depth of sequencing across the genome)

  • Signal tracks (ChIP-seq enrichment, ATAC-seq accessibility)

  • Normalized scores (RPKM, CPM, log2 fold-change)

  • Conservation scores (phastCons, phyloP)

  • GC content and other per-base statistics

These formats are essential for genome browser visualisation (IGV, UCSC Genome Browser, JBrowse) and for computing signal matrices around genomic features (e.g. heatmaps of ChIP-seq signal at promoters).

Format

Extension

Description

bedGraph

.bedGraph, .bg

Human-readable, tab-delimited text. Four columns: chrom, start, end, value. Uses 0-based, half-open coordinates (same as BED).

BigWig

.bw, .bigWig

Compressed binary version of bedGraph. Indexed for fast random access. The preferred format for large datasets and genome browsers.

Structure

bedGraph format

bedGraph is a simple four-column tab-delimited format:

chr1  0      1000   0.0
chr1  1000   1050   3.5
chr1  1050   1200   12.8
chr1  1200   1350   8.2
chr1  1350   2000   0.0

Each line defines a genomic interval and its associated value. Intervals are non-overlapping and typically consecutive. Regions with a value of zero may be omitted to save space.

Col

Field

Description

1

chrom

Chromosome name.

2

chromStart

Start position (0-based, inclusive).

3

chromEnd

End position (exclusive).

4

value

Signal value (integer or float).

BigWig format

BigWig is a binary, indexed format that stores the same data as bedGraph but in a compressed R-tree structure. It cannot be read as plain text but supports efficient random access for any genomic region. BigWig files are typically 5–20x smaller than the equivalent bedGraph.

The internal structure consists of:

  1. Header – magic number, version, chromosome list.

  2. Zoom levels – pre-computed summaries at multiple resolutions for fast rendering at different scales.

  3. Data blocks – compressed intervals with values, organised in an R-tree index for fast regional queries.

Working With

Generating coverage tracks from BAM

The most common way to create BigWig files is with bamCoverage from deepTools:

# Basic coverage track
bamCoverage -b aligned.sorted.bam -o coverage.bw \
  --binSize 10 --normalizeUsing RPKM -p 8

# CPM-normalised track (reads per million)
bamCoverage -b aligned.sorted.bam -o coverage_cpm.bw \
  --binSize 10 --normalizeUsing CPM -p 8

# Extend reads to estimated fragment size (ChIP-seq)
bamCoverage -b chip.sorted.bam -o chip.bw \
  --binSize 10 --extendReads 200 --normalizeUsing RPKM -p 8

Comparing signal between samples

# Log2 ratio of ChIP over Input
bamCompare -b1 chip.sorted.bam -b2 input.sorted.bam \
  -o log2ratio.bw --binSize 50 -p 8

# Subtract input from ChIP
bamCompare -b1 chip.sorted.bam -b2 input.sorted.bam \
  -o subtracted.bw --ratio subtract --binSize 50 -p 8

Computing signal matrices and heatmaps

# Compute matrix of signal around TSS
computeMatrix reference-point \
  -S chip.bw input.bw \
  -R genes.bed \
  --referencePoint TSS \
  -a 3000 -b 3000 \
  -o matrix.gz -p 8

# Plot as a heatmap
plotHeatmap -m matrix.gz -o heatmap.png \
  --colorMap RdBu_r --whatToShow 'heatmap and colorbar'

# Plot as a profile
plotProfile -m matrix.gz -o profile.png

Converting between formats

# bedGraph to BigWig (requires chromosome sizes file)
bedGraphToBigWig coverage.bedGraph chrom.sizes coverage.bw

# BigWig to bedGraph
bigWigToBedGraph coverage.bw coverage.bedGraph

# Generate chrom.sizes from a FASTA index
cut -f1,2 reference.fa.fai > chrom.sizes

Generating bedGraph from BAM

# Genome-wide coverage as bedGraph
bedtools genomecov -ibam aligned.sorted.bam -bg > coverage.bedGraph

# Sort the bedGraph (required before conversion to BigWig)
sort -k1,1 -k2,2n coverage.bedGraph > coverage.sorted.bedGraph

Extracting values from BigWig

# Extract signal values for specific regions
bigWigAverageOverBed coverage.bw regions.bed output.tab

# Get signal summary (mean, min, max) for a region
bigWigSummary coverage.bw chr1 10000 20000 10

Viewing in genome browsers

BigWig files can be loaded directly into:

  • IGV – drag and drop the .bw file or use File > Load from File.

  • UCSC Genome Browser – host the file on a web server and add a custom track line.

  • JBrowse – add as a BigWig track in the configuration.

See Also

  • deepToolsbamCoverage, bamCompare, computeMatrix, and plotHeatmap

  • MACS2 – peak caller that produces bedGraph signal tracks

  • IGV – genome browser for viewing BigWig files

  • BED – the interval format that bedGraph extends

  • SAM / BAM / CRAM – the alignment format from which coverage is computed