STARsolo
Overview
STARsolo is a module of the STAR aligner that provides a drop-in, open-source replacement for Cell Ranger’s gene expression quantification. It maps reads to a reference genome and demultiplexes cell barcodes and UMIs in a single pass, producing cell-by-gene count matrices compatible with downstream tools such as Seurat and Scanpy. STARsolo reproduces Cell Ranger results while offering substantially faster runtimes and greater flexibility for non-10x protocols.
Installation
mamba install -c bioconda star
Basic Usage
Align reads from a 10x Chromium 3’ v3 library and produce gene expression count matrices. Note that Read 2 (cDNA) is provided first, followed by Read 1 (barcode + UMI).
STAR --runMode alignReads \
--genomeDir star_index/ \
--readFilesIn sample_R2.fastq.gz sample_R1.fastq.gz \
--readFilesCommand zcat \
--soloType CB_UMI_Simple \
--soloCBstart 1 --soloCBlen 16 \
--soloUMIstart 17 --soloUMIlen 12 \
--soloCBwhitelist 3M-february-2018.txt \
--soloCellFilter EmptyDrops_CR \
--soloFeatures Gene GeneFull \
--outSAMtype BAM SortedByCoordinate \
--runThreadN 16
Before running, build the STAR genome index if it does not yet exist:
STAR --runMode genomeGenerate \
--genomeDir star_index/ \
--genomeFastaFiles GRCh38.fa \
--sjdbGTFfile genes.gtf \
--runThreadN 16
Key Parameters
Flag / option |
Description |
|---|---|
|
Path to the STAR genome index directory. |
|
Input FASTQ files. For 10x data, provide the cDNA read first, then the barcode/UMI read. |
|
Command for decompressing input files ( |
|
Barcode/UMI layout type ( |
|
Start position and length of the cell barcode within Read 1. |
|
Start position and length of the UMI within Read 1. |
|
Path to the barcode whitelist (e.g., |
|
Cell-calling algorithm ( |
|
Which features to quantify ( |
|
Output format ( |
|
Number of threads to use for alignment. |
Expected Output
STARsolo writes output into the default Solo.out/ directory:
Solo.out/Gene/filtered/– filtered count matrix in MEX format (matrix.mtx,barcodes.tsv,features.tsv), directly loadable by Seurat or Scanpy.Solo.out/Gene/raw/– unfiltered matrix containing all detected barcodes.Solo.out/GeneFull/filtered/– count matrix including intronic reads (useful for single-nucleus RNA-seq).Aligned.sortedByCoord.out.bam– coordinate-sorted BAM file with barcode and UMI tags.Log.final.out– alignment summary statistics (mapping rate, unique vs. multi-mapped reads).
See Also
Cell Ranger – the official 10x Genomics pipeline with integrated reporting
Seurat – R toolkit for downstream single-cell analysis
Scanpy – Python framework for single-cell analysis and visualisation
Alignment – general-purpose alignment tools