Seurat ====== Overview -------- Seurat is a comprehensive R toolkit for single-cell RNA-seq analysis, developed by the Satija Lab at the New York Genome Center. It provides functions for quality control filtering, normalisation, feature selection, dimensionality reduction (PCA, UMAP, t-SNE), graph-based clustering, differential expression testing, and multi-modal data integration. Seurat is one of the most widely adopted frameworks in the single-cell community and supports a broad range of assay types including scRNA-seq, CITE-seq, spatial transcriptomics, and scATAC-seq. Installation ------------ Install Seurat from CRAN within an R session: .. code-block:: r install.packages("Seurat") For the latest development version: .. code-block:: r remotes::install_github("satijalab/seurat", ref = "develop") Basic Usage ----------- Load a 10x Cell Ranger output matrix, perform quality control filtering, and run the standard clustering workflow. .. code-block:: r library(Seurat) # Load 10x data data <- Read10X(data.dir = "filtered_feature_bc_matrix/") obj <- CreateSeuratObject(counts = data, min.cells = 3, min.features = 200) # QC and filtering obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-") obj <- subset(obj, nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt < 20) # Standard workflow obj <- NormalizeData(obj) |> FindVariableFeatures() |> ScaleData() |> RunPCA() |> FindNeighbors(dims = 1:20) |> FindClusters(resolution = 0.5) |> RunUMAP(dims = 1:20) DimPlot(obj, reduction = "umap", label = TRUE) Key Parameters -------------- .. list-table:: :header-rows: 1 :widths: 30 70 * - Function / parameter - Description * - ``min.cells`` - Minimum number of cells a gene must be detected in to be retained (passed to ``CreateSeuratObject``). * - ``min.features`` - Minimum number of genes a cell must express to be retained. * - ``PercentageFeatureSet`` - Calculates the percentage of counts from a gene set (e.g., mitochondrial genes matching ``^MT-``). * - ``NormalizeData`` - Log-normalises counts per cell (default: ``LogNormalize`` with a scale factor of 10,000). * - ``FindVariableFeatures`` - Selects highly variable genes (default: top 2,000 by variance stabilising transformation). * - ``ScaleData`` - Centres and scales expression values; optionally regresses out confounding variables. * - ``RunPCA`` - Performs principal component analysis on the scaled variable features. * - ``FindNeighbors(dims = 1:20)`` - Constructs a shared nearest-neighbour graph using the specified PCA dimensions. * - ``FindClusters(resolution)`` - Identifies cell clusters using the Louvain or Leiden algorithm. Higher resolution produces more clusters. * - ``RunUMAP(dims = 1:20)`` - Computes a UMAP embedding for visualisation using the specified PCA dimensions. Expected Output --------------- The standard workflow produces a Seurat object stored in memory with the following key slots: * ``obj[["RNA"]]@counts`` -- original count matrix. * ``obj[["RNA"]]@data`` -- normalised expression values. * ``obj[["RNA"]]@scale.data`` -- scaled expression matrix. * ``obj@reductions$pca`` -- PCA coordinates. * ``obj@reductions$umap`` -- UMAP coordinates. * ``obj@meta.data$seurat_clusters`` -- cluster assignments for every cell. Common output files written to disk: * ``DimPlot()`` generates a UMAP scatter plot coloured by cluster identity. * ``saveRDS(obj, "seurat_object.rds")`` saves the entire analysis for later reloading. See Also -------- * :doc:`scanpy` -- Python equivalent for single-cell analysis, producing comparable results * :doc:`cellranger` -- upstream pipeline that generates the count matrices loaded by Seurat * :doc:`starsolo` -- open-source alternative for generating count matrices