MEX / 10x Format

Overview

The MEX (Market Exchange) format is a sparse matrix representation used by 10x Genomics Cell Ranger and STARsolo to store single-cell gene expression count data. Because scRNA-seq count matrices are extremely sparse – typically more than 90 % of entries are zero – MEX stores only the non-zero values, achieving dramatic space savings over dense formats.

The 10x MEX output consists of three files in a directory (commonly named filtered_feature_bc_matrix/ or raw_feature_bc_matrix/):

File

Contents

matrix.mtx.gz

Sparse count matrix in Matrix Market coordinate format.

barcodes.tsv.gz

Cell barcodes (one per line, corresponding to matrix columns).

features.tsv.gz

Gene/feature information (corresponding to matrix rows).

This three-file bundle is the primary interchange format between upstream processing (Cell Ranger, STARsolo, Alevin, Kallisto-BUStools) and downstream analysis frameworks (Scanpy, Seurat, Bioconductor/SingleCellExperiment).

Structure

matrix.mtx

The matrix file follows the Matrix Market coordinate format:

%%MatrixMarket matrix coordinate integer general
%
33538 7374 12890567
1 1 3
32 1 1
51 1 5
134 1 2
245 2 1
...
  • Line 1 – Header declaring the format (coordinate, integer, general).

  • Line 2 – Optional comment lines starting with %.

  • Line 3 – Dimensions: n_features  n_barcodes  n_nonzero_entries.

  • Remaining lines – One triplet per non-zero entry: feature_index  barcode_index  count.

Indices are 1-based (the first feature and first barcode are numbered 1).

barcodes.tsv

One cell barcode per line:

AAACCCAAGAAACACT-1
AAACCCAAGAAACCAT-1
AAACCCAAGAAACTGT-1
AAACCCAAGAAAGCGA-1
...

The -1 suffix is a GEM well identifier appended by Cell Ranger. The number of lines equals the number of columns in the matrix.

features.tsv

Tab-separated file with gene/feature metadata:

ENSG00000243485  MIR1302-2HG  Gene Expression
ENSG00000237613  FAM138A      Gene Expression
ENSG00000186092  OR4F5        Gene Expression
ENSG00000238009  AL627309.1   Gene Expression
...

Columns:

Col

Field

Description

1

Feature ID

Ensembl gene ID or feature identifier.

2

Feature name

Gene symbol or feature name.

3

Feature type

Gene Expression, Antibody Capture (CITE-seq), CRISPR Guide Capture (Perturb-seq), Multiplexing Capture (cell hashing), etc.

The number of lines equals the number of rows in the matrix.

Filtered vs raw matrices

Cell Ranger outputs two versions of the MEX directory:

Directory

Description

raw_feature_bc_matrix/

Contains all barcodes detected (including empty droplets). May have hundreds of thousands of columns.

filtered_feature_bc_matrix/

Contains only barcodes that Cell Ranger classified as real cells. This is the starting point for most analyses.

Working With

Loading into Scanpy (Python)

python3 -c "
import scanpy as sc

# Load from the three-file MEX directory
adata = sc.read_10x_mtx(
    'filtered_feature_bc_matrix/',
    var_names='gene_symbols',    # use gene symbols as variable names
    cache=True                    # cache for faster re-loading
)
print(adata)
# AnnData object with n_obs x n_vars = 7374 x 33538

# Save as h5ad for faster future access
adata.write('counts.h5ad')
"

Loading into Seurat (R)

Rscript -e '
library(Seurat)

# Load from the three-file MEX directory
counts <- Read10X(data.dir = "filtered_feature_bc_matrix/")
seurat_obj <- CreateSeuratObject(counts = counts, project = "my_project")
print(seurat_obj)
'

Loading into Bioconductor (R)

Rscript -e '
library(DropletUtils)

# Load as SingleCellExperiment
sce <- read10xCounts("filtered_feature_bc_matrix/")
print(sce)
'

Inspecting MEX files from the command line

# Count the number of barcodes (cells)
zcat filtered_feature_bc_matrix/barcodes.tsv.gz | wc -l

# Count the number of features (genes)
zcat filtered_feature_bc_matrix/features.tsv.gz | wc -l

# View the matrix header (dimensions and non-zero count)
zcat filtered_feature_bc_matrix/matrix.mtx.gz | head -3

# View the first few features
zcat filtered_feature_bc_matrix/features.tsv.gz | head -5

Loading STARsolo output

STARsolo produces the same three-file format:

python3 -c "
import scanpy as sc

# STARsolo output directory structure
adata = sc.read_10x_mtx(
    'Solo.out/Gene/filtered/',
    var_names='gene_symbols'
)
adata.write('starsolo_counts.h5ad')
"

Loading multi-modal data (CITE-seq)

When the features.tsv file contains multiple feature types (e.g. Gene Expression and Antibody Capture), they must be separated:

python3 -c "
import scanpy as sc
import muon as mu

# Read as a MuData object (multi-modal)
mdata = mu.read_10x_mtx('filtered_feature_bc_matrix/')
print(mdata)
# MuData object with 'rna' and 'prot' modalities
"

See Also

  • Cell Ranger – the upstream pipeline that produces MEX output

  • STARsolo – alternative single-cell aligner with MEX output

  • h5ad / AnnData – the h5ad format that MEX data is typically converted into

  • Scanpy – Python framework for analysing single-cell data