Seurat

Overview

Seurat is a comprehensive R toolkit for single-cell RNA-seq analysis, developed by the Satija Lab at the New York Genome Center. It provides functions for quality control filtering, normalisation, feature selection, dimensionality reduction (PCA, UMAP, t-SNE), graph-based clustering, differential expression testing, and multi-modal data integration. Seurat is one of the most widely adopted frameworks in the single-cell community and supports a broad range of assay types including scRNA-seq, CITE-seq, spatial transcriptomics, and scATAC-seq.

Installation

Install Seurat from CRAN within an R session:

install.packages("Seurat")

For the latest development version:

remotes::install_github("satijalab/seurat", ref = "develop")

Basic Usage

Load a 10x Cell Ranger output matrix, perform quality control filtering, and run the standard clustering workflow.

library(Seurat)

# Load 10x data
data <- Read10X(data.dir = "filtered_feature_bc_matrix/")
obj <- CreateSeuratObject(counts = data, min.cells = 3, min.features = 200)

# QC and filtering
obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-")
obj <- subset(obj, nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt < 20)

# Standard workflow
obj <- NormalizeData(obj) |>
  FindVariableFeatures() |>
  ScaleData() |>
  RunPCA() |>
  FindNeighbors(dims = 1:20) |>
  FindClusters(resolution = 0.5) |>
  RunUMAP(dims = 1:20)

DimPlot(obj, reduction = "umap", label = TRUE)

Key Parameters

Function / parameter

Description

min.cells

Minimum number of cells a gene must be detected in to be retained (passed to CreateSeuratObject).

min.features

Minimum number of genes a cell must express to be retained.

PercentageFeatureSet

Calculates the percentage of counts from a gene set (e.g., mitochondrial genes matching ^MT-).

NormalizeData

Log-normalises counts per cell (default: LogNormalize with a scale factor of 10,000).

FindVariableFeatures

Selects highly variable genes (default: top 2,000 by variance stabilising transformation).

ScaleData

Centres and scales expression values; optionally regresses out confounding variables.

RunPCA

Performs principal component analysis on the scaled variable features.

FindNeighbors(dims = 1:20)

Constructs a shared nearest-neighbour graph using the specified PCA dimensions.

FindClusters(resolution)

Identifies cell clusters using the Louvain or Leiden algorithm. Higher resolution produces more clusters.

RunUMAP(dims = 1:20)

Computes a UMAP embedding for visualisation using the specified PCA dimensions.

Expected Output

The standard workflow produces a Seurat object stored in memory with the following key slots:

  • obj[["RNA"]]@counts – original count matrix.

  • obj[["RNA"]]@data – normalised expression values.

  • obj[["RNA"]]@scale.data – scaled expression matrix.

  • obj@reductions$pca – PCA coordinates.

  • obj@reductions$umap – UMAP coordinates.

  • obj@meta.data$seurat_clusters – cluster assignments for every cell.

Common output files written to disk:

  • DimPlot() generates a UMAP scatter plot coloured by cluster identity.

  • saveRDS(obj, "seurat_object.rds") saves the entire analysis for later reloading.

See Also

  • Scanpy – Python equivalent for single-cell analysis, producing comparable results

  • Cell Ranger – upstream pipeline that generates the count matrices loaded by Seurat

  • STARsolo – open-source alternative for generating count matrices