STARsolo

Overview

STARsolo is a module of the STAR aligner that provides a drop-in, open-source replacement for Cell Ranger’s gene expression quantification. It maps reads to a reference genome and demultiplexes cell barcodes and UMIs in a single pass, producing cell-by-gene count matrices compatible with downstream tools such as Seurat and Scanpy. STARsolo reproduces Cell Ranger results while offering substantially faster runtimes and greater flexibility for non-10x protocols.

Installation

mamba install -c bioconda star

Basic Usage

Align reads from a 10x Chromium 3’ v3 library and produce gene expression count matrices. Note that Read 2 (cDNA) is provided first, followed by Read 1 (barcode + UMI).

STAR --runMode alignReads \
  --genomeDir star_index/ \
  --readFilesIn sample_R2.fastq.gz sample_R1.fastq.gz \
  --readFilesCommand zcat \
  --soloType CB_UMI_Simple \
  --soloCBstart 1 --soloCBlen 16 \
  --soloUMIstart 17 --soloUMIlen 12 \
  --soloCBwhitelist 3M-february-2018.txt \
  --soloCellFilter EmptyDrops_CR \
  --soloFeatures Gene GeneFull \
  --outSAMtype BAM SortedByCoordinate \
  --runThreadN 16

Before running, build the STAR genome index if it does not yet exist:

STAR --runMode genomeGenerate \
  --genomeDir star_index/ \
  --genomeFastaFiles GRCh38.fa \
  --sjdbGTFfile genes.gtf \
  --runThreadN 16

Key Parameters

Flag / option

Description

--genomeDir

Path to the STAR genome index directory.

--readFilesIn

Input FASTQ files. For 10x data, provide the cDNA read first, then the barcode/UMI read.

--readFilesCommand

Command for decompressing input files (zcat for .gz).

--soloType

Barcode/UMI layout type (CB_UMI_Simple for standard 10x protocols).

--soloCBstart / --soloCBlen

Start position and length of the cell barcode within Read 1.

--soloUMIstart / --soloUMIlen

Start position and length of the UMI within Read 1.

--soloCBwhitelist

Path to the barcode whitelist (e.g., 3M-february-2018.txt for Chromium v3).

--soloCellFilter

Cell-calling algorithm (EmptyDrops_CR matches Cell Ranger behaviour).

--soloFeatures

Which features to quantify (Gene for exonic, GeneFull for exonic + intronic).

--outSAMtype

Output format (BAM SortedByCoordinate produces a sorted BAM file).

--runThreadN

Number of threads to use for alignment.

Expected Output

STARsolo writes output into the default Solo.out/ directory:

  • Solo.out/Gene/filtered/ – filtered count matrix in MEX format (matrix.mtx, barcodes.tsv, features.tsv), directly loadable by Seurat or Scanpy.

  • Solo.out/Gene/raw/ – unfiltered matrix containing all detected barcodes.

  • Solo.out/GeneFull/filtered/ – count matrix including intronic reads (useful for single-nucleus RNA-seq).

  • Aligned.sortedByCoord.out.bam – coordinate-sorted BAM file with barcode and UMI tags.

  • Log.final.out – alignment summary statistics (mapping rate, unique vs. multi-mapped reads).

See Also

  • Cell Ranger – the official 10x Genomics pipeline with integrated reporting

  • Seurat – R toolkit for downstream single-cell analysis

  • Scanpy – Python framework for single-cell analysis and visualisation

  • Alignment – general-purpose alignment tools