MEX / 10x Format
Overview
The MEX (Market Exchange) format is a sparse matrix representation used by 10x Genomics Cell Ranger and STARsolo to store single-cell gene expression count data. Because scRNA-seq count matrices are extremely sparse – typically more than 90 % of entries are zero – MEX stores only the non-zero values, achieving dramatic space savings over dense formats.
The 10x MEX output consists of three files in a directory (commonly named
filtered_feature_bc_matrix/ or raw_feature_bc_matrix/):
File |
Contents |
|---|---|
|
Sparse count matrix in Matrix Market coordinate format. |
|
Cell barcodes (one per line, corresponding to matrix columns). |
|
Gene/feature information (corresponding to matrix rows). |
This three-file bundle is the primary interchange format between upstream processing (Cell Ranger, STARsolo, Alevin, Kallisto-BUStools) and downstream analysis frameworks (Scanpy, Seurat, Bioconductor/SingleCellExperiment).
Structure
matrix.mtx
The matrix file follows the Matrix Market coordinate format:
%%MatrixMarket matrix coordinate integer general
%
33538 7374 12890567
1 1 3
32 1 1
51 1 5
134 1 2
245 2 1
...
Line 1 – Header declaring the format (coordinate, integer, general).
Line 2 – Optional comment lines starting with
%.Line 3 – Dimensions:
n_features n_barcodes n_nonzero_entries.Remaining lines – One triplet per non-zero entry:
feature_index barcode_index count.
Indices are 1-based (the first feature and first barcode are numbered 1).
barcodes.tsv
One cell barcode per line:
AAACCCAAGAAACACT-1
AAACCCAAGAAACCAT-1
AAACCCAAGAAACTGT-1
AAACCCAAGAAAGCGA-1
...
The -1 suffix is a GEM well identifier appended by Cell Ranger. The
number of lines equals the number of columns in the matrix.
features.tsv
Tab-separated file with gene/feature metadata:
ENSG00000243485 MIR1302-2HG Gene Expression
ENSG00000237613 FAM138A Gene Expression
ENSG00000186092 OR4F5 Gene Expression
ENSG00000238009 AL627309.1 Gene Expression
...
Columns:
Col |
Field |
Description |
|---|---|---|
1 |
Feature ID |
Ensembl gene ID or feature identifier. |
2 |
Feature name |
Gene symbol or feature name. |
3 |
Feature type |
|
The number of lines equals the number of rows in the matrix.
Filtered vs raw matrices
Cell Ranger outputs two versions of the MEX directory:
Directory |
Description |
|---|---|
|
Contains all barcodes detected (including empty droplets). May have hundreds of thousands of columns. |
|
Contains only barcodes that Cell Ranger classified as real cells. This is the starting point for most analyses. |
Working With
Loading into Scanpy (Python)
python3 -c "
import scanpy as sc
# Load from the three-file MEX directory
adata = sc.read_10x_mtx(
'filtered_feature_bc_matrix/',
var_names='gene_symbols', # use gene symbols as variable names
cache=True # cache for faster re-loading
)
print(adata)
# AnnData object with n_obs x n_vars = 7374 x 33538
# Save as h5ad for faster future access
adata.write('counts.h5ad')
"
Loading into Seurat (R)
Rscript -e '
library(Seurat)
# Load from the three-file MEX directory
counts <- Read10X(data.dir = "filtered_feature_bc_matrix/")
seurat_obj <- CreateSeuratObject(counts = counts, project = "my_project")
print(seurat_obj)
'
Loading into Bioconductor (R)
Rscript -e '
library(DropletUtils)
# Load as SingleCellExperiment
sce <- read10xCounts("filtered_feature_bc_matrix/")
print(sce)
'
Inspecting MEX files from the command line
# Count the number of barcodes (cells)
zcat filtered_feature_bc_matrix/barcodes.tsv.gz | wc -l
# Count the number of features (genes)
zcat filtered_feature_bc_matrix/features.tsv.gz | wc -l
# View the matrix header (dimensions and non-zero count)
zcat filtered_feature_bc_matrix/matrix.mtx.gz | head -3
# View the first few features
zcat filtered_feature_bc_matrix/features.tsv.gz | head -5
Loading STARsolo output
STARsolo produces the same three-file format:
python3 -c "
import scanpy as sc
# STARsolo output directory structure
adata = sc.read_10x_mtx(
'Solo.out/Gene/filtered/',
var_names='gene_symbols'
)
adata.write('starsolo_counts.h5ad')
"
Loading multi-modal data (CITE-seq)
When the features.tsv file contains multiple feature types (e.g. Gene Expression and Antibody Capture), they must be separated:
python3 -c "
import scanpy as sc
import muon as mu
# Read as a MuData object (multi-modal)
mdata = mu.read_10x_mtx('filtered_feature_bc_matrix/')
print(mdata)
# MuData object with 'rna' and 'prot' modalities
"
See Also
Cell Ranger – the upstream pipeline that produces MEX output
STARsolo – alternative single-cell aligner with MEX output
h5ad / AnnData – the h5ad format that MEX data is typically converted into
Scanpy – Python framework for analysing single-cell data