Hi-C Formats

Overview

Hi-C experiments capture three-dimensional genome organization by measuring physical proximity between genomic loci. The resulting contact matrices are stored in specialized formats optimized for multi-resolution queries.

.hic Format

The .hic format (Juicer/Juicebox) stores contact matrices at multiple resolutions in a single compressed file.

# Create .hic file from valid pairs
java -jar juicer_tools.jar pre \
  valid_pairs.txt output.hic hg38

# Dump contacts at 10kb resolution
java -jar juicer_tools.jar dump observed KR \
  output.hic chr1 chr1 BP 10000 chr1_contacts.txt

.cool / .mcool Format

The .cool format (cooler) stores contact matrices in HDF5. Multi-resolution .mcool files contain matrices at several bin sizes.

# Create .cool from pairs file
cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 \
  hg38.chrom.sizes:10000 pairs.txt output.cool

# Generate multi-resolution .mcool
cooler zoomify output.cool -o output.mcool

# Balance (normalize) a cooler
cooler balance output.cool

Working with Hi-C Data in Python

import cooler

clr = cooler.Cooler("output.mcool::resolutions/10000")
matrix = clr.matrix(balance=True).fetch("chr1:0-5000000")
print(matrix.shape)

Key Differences

Feature

.hic

.cool / .mcool

Library

Juicer Tools (Java)

cooler (Python)

Storage

Custom binary

HDF5

Multi-resolution

Built-in

.mcool container

Normalization

KR, VC, VC_SQRT

ICE (iterative correction)

See Also