Seurat

Overview

Seurat is a comprehensive R toolkit for single-cell RNA-seq analysis, developed by the Satija Lab at the New York Genome Center. It provides functions for quality control filtering, normalisation, feature selection, dimensionality reduction (PCA, UMAP, t-SNE), graph-based clustering, differential expression testing, and multi-modal data integration. Seurat is one of the most widely adopted frameworks in the single-cell community and supports a broad range of assay types including scRNA-seq, CITE-seq, spatial transcriptomics, and scATAC-seq.

Installation

Install Seurat from CRAN within an R session:

install.packages("Seurat")

For the latest development version:

remotes::install_github("satijalab/seurat", ref = "develop")

Basic Usage

Load a 10x Cell Ranger output matrix, perform quality control filtering, and run the standard clustering workflow.

library(Seurat)

# Load 10x data
data <- Read10X(data.dir = "filtered_feature_bc_matrix/")
obj <- CreateSeuratObject(counts = data, min.cells = 3, min.features = 200)

# QC and filtering
obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-")
obj <- subset(obj, nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt < 20)

# Standard workflow
obj <- NormalizeData(obj) |>
  FindVariableFeatures() |>
  ScaleData() |>
  RunPCA() |>
  FindNeighbors(dims = 1:20) |>
  FindClusters(resolution = 0.5) |>
  RunUMAP(dims = 1:20)

DimPlot(obj, reduction = "umap", label = TRUE)

Key Parameters

Function / parameter	Description
`min.cells`	Minimum number of cells a gene must be detected in to be retained (passed to `CreateSeuratObject`).
`min.features`	Minimum number of genes a cell must express to be retained.
`PercentageFeatureSet`	Calculates the percentage of counts from a gene set (e.g., mitochondrial genes matching `^MT-`).
`NormalizeData`	Log-normalises counts per cell (default: `LogNormalize` with a scale factor of 10,000).
`FindVariableFeatures`	Selects highly variable genes (default: top 2,000 by variance stabilising transformation).
`ScaleData`	Centres and scales expression values; optionally regresses out confounding variables.
`RunPCA`	Performs principal component analysis on the scaled variable features.
`FindNeighbors(dims = 1:20)`	Constructs a shared nearest-neighbour graph using the specified PCA dimensions.
`FindClusters(resolution)`	Identifies cell clusters using the Louvain or Leiden algorithm. Higher resolution produces more clusters.
`RunUMAP(dims = 1:20)`	Computes a UMAP embedding for visualisation using the specified PCA dimensions.

Expected Output

The standard workflow produces a Seurat object stored in memory with the following key slots:

obj[["RNA"]]@counts – original count matrix.
obj[["RNA"]]@data – normalised expression values.
obj[["RNA"]]@scale.data – scaled expression matrix.
obj@reductions$pca – PCA coordinates.
obj@reductions$umap – UMAP coordinates.
obj@meta.data$seurat_clusters – cluster assignments for every cell.

Common output files written to disk:

DimPlot() generates a UMAP scatter plot coloured by cluster identity.
saveRDS(obj, "seurat_object.rds") saves the entire analysis for later reloading.