MEX / 10x Format

Overview

The MEX (Market Exchange) format is a sparse matrix representation used by 10x Genomics Cell Ranger and STARsolo to store single-cell gene expression count data. Because scRNA-seq count matrices are extremely sparse – typically more than 90 % of entries are zero – MEX stores only the non-zero values, achieving dramatic space savings over dense formats.

The 10x MEX output consists of three files in a directory (commonly named filtered_feature_bc_matrix/ or raw_feature_bc_matrix/):

File	Contents
`matrix.mtx.gz`	Sparse count matrix in Matrix Market coordinate format.
`barcodes.tsv.gz`	Cell barcodes (one per line, corresponding to matrix columns).
`features.tsv.gz`	Gene/feature information (corresponding to matrix rows).

This three-file bundle is the primary interchange format between upstream processing (Cell Ranger, STARsolo, Alevin, Kallisto-BUStools) and downstream analysis frameworks (Scanpy, Seurat, Bioconductor/SingleCellExperiment).

Structure

matrix.mtx

The matrix file follows the Matrix Market coordinate format:

%%MatrixMarket matrix coordinate integer general
%
33538 7374 12890567
1 1 3
32 1 1
51 1 5
134 1 2
245 2 1
...

Line 1 – Header declaring the format (coordinate, integer, general).
Line 2 – Optional comment lines starting with %.
Line 3 – Dimensions: n_features n_barcodes n_nonzero_entries.
Remaining lines – One triplet per non-zero entry: feature_index barcode_index count.

Indices are 1-based (the first feature and first barcode are numbered 1).

barcodes.tsv

One cell barcode per line:

AAACCCAAGAAACACT-1
AAACCCAAGAAACCAT-1
AAACCCAAGAAACTGT-1
AAACCCAAGAAAGCGA-1
...

The -1 suffix is a GEM well identifier appended by Cell Ranger. The number of lines equals the number of columns in the matrix.

features.tsv

Tab-separated file with gene/feature metadata:

ENSG00000243485  MIR1302-2HG  Gene Expression
ENSG00000237613  FAM138A      Gene Expression
ENSG00000186092  OR4F5        Gene Expression
ENSG00000238009  AL627309.1   Gene Expression
...

Columns:

Col	Field	Description
1	Feature ID	Ensembl gene ID or feature identifier.
2	Feature name	Gene symbol or feature name.
3	Feature type	`Gene Expression`, `Antibody Capture` (CITE-seq), `CRISPR Guide Capture` (Perturb-seq), `Multiplexing Capture` (cell hashing), etc.

The number of lines equals the number of rows in the matrix.

Filtered vs raw matrices

Cell Ranger outputs two versions of the MEX directory:

Directory	Description
`raw_feature_bc_matrix/`	Contains all barcodes detected (including empty droplets). May have hundreds of thousands of columns.
`filtered_feature_bc_matrix/`	Contains only barcodes that Cell Ranger classified as real cells. This is the starting point for most analyses.

Working With

Loading into Scanpy (Python)

python3 -c "
import scanpy as sc

# Load from the three-file MEX directory
adata = sc.read_10x_mtx(
    'filtered_feature_bc_matrix/',
    var_names='gene_symbols',    # use gene symbols as variable names
    cache=True                    # cache for faster re-loading
)
print(adata)
# AnnData object with n_obs x n_vars = 7374 x 33538

# Save as h5ad for faster future access
adata.write('counts.h5ad')
"

Loading into Seurat (R)

Rscript -e '
library(Seurat)

# Load from the three-file MEX directory
counts <- Read10X(data.dir = "filtered_feature_bc_matrix/")
seurat_obj <- CreateSeuratObject(counts = counts, project = "my_project")
print(seurat_obj)
'

Loading into Bioconductor (R)

Rscript -e '
library(DropletUtils)

# Load as SingleCellExperiment
sce <- read10xCounts("filtered_feature_bc_matrix/")
print(sce)
'

Inspecting MEX files from the command line

# Count the number of barcodes (cells)
zcat filtered_feature_bc_matrix/barcodes.tsv.gz | wc -l

# Count the number of features (genes)
zcat filtered_feature_bc_matrix/features.tsv.gz | wc -l

# View the matrix header (dimensions and non-zero count)
zcat filtered_feature_bc_matrix/matrix.mtx.gz | head -3

# View the first few features
zcat filtered_feature_bc_matrix/features.tsv.gz | head -5

Loading STARsolo output

STARsolo produces the same three-file format:

python3 -c "
import scanpy as sc

# STARsolo output directory structure
adata = sc.read_10x_mtx(
    'Solo.out/Gene/filtered/',
    var_names='gene_symbols'
)
adata.write('starsolo_counts.h5ad')
"