RNA-seq Differential Expression

Overview

RNA-seq differential expression analysis quantifies gene expression levels across experimental conditions and identifies genes with statistically significant changes. The Snakemake implementation uses STAR for splice-aware alignment followed by featureCounts for read summarisation, while the Nextflow implementation uses Salmon for lightweight, alignment-free quantification via quasi-mapping. Both approaches feed count matrices into DESeq2 for normalisation, dispersion estimation, and differential testing.

Pipeline Steps

  1. Trim adapters and low-quality bases with fastp.

  2. Align reads to the reference genome (STAR) or quantify transcripts directly (Salmon).

  3. Generate a gene-level count matrix (featureCounts or Salmon quant.sf).

  4. Run DESeq2 for normalisation and differential expression testing.

  5. Aggregate QC metrics into a MultiQC report.

Implementation

The Snakemake approach uses STAR two-pass alignment and featureCounts for read counting, then runs DESeq2 via an R script.

# Snakefile -- RNA-seq Differential Expression (STAR + featureCounts)
configfile: "config.yaml"

SAMPLES = config["samples"]
REF     = config["star_index"]
GTF     = config["gtf"]

rule all:
    input:
        "results/deseq2/differential_expression.csv",
        "results/multiqc/multiqc_report.html"

rule fastp_trim:
    input:
        r1="data/{sample}_R1.fastq.gz",
        r2="data/{sample}_R2.fastq.gz"
    output:
        r1="results/trimmed/{sample}_R1.trimmed.fastq.gz",
        r2="results/trimmed/{sample}_R2.trimmed.fastq.gz",
        json="results/trimmed/{sample}_fastp.json",
        html="results/trimmed/{sample}_fastp.html"
    threads: 4
    conda: "envs/qc.yaml"
    shell:
        """
        fastp -i {input.r1} -I {input.r2} \
          -o {output.r1} -O {output.r2} \
          -j {output.json} -h {output.html} \
          --detect_adapter_for_pe --thread {threads}
        """

rule star_align:
    input:
        r1="results/trimmed/{sample}_R1.trimmed.fastq.gz",
        r2="results/trimmed/{sample}_R2.trimmed.fastq.gz",
        index=REF
    output:
        bam="results/star/{sample}/{sample}.Aligned.sortedByCoord.out.bam",
        log="results/star/{sample}/{sample}.Log.final.out"
    threads: 12
    conda: "envs/align.yaml"
    shell:
        """
        STAR --runThreadN {threads} \
          --genomeDir {input.index} \
          --readFilesIn {input.r1} {input.r2} \
          --readFilesCommand zcat \
          --outSAMtype BAM SortedByCoordinate \
          --outFileNamePrefix results/star/{wildcards.sample}/{wildcards.sample}. \
          --quantMode GeneCounts \
          --twopassMode Basic
        samtools index {output.bam}
        """

rule featurecounts:
    input:
        bams=expand("results/star/{sample}/{sample}.Aligned.sortedByCoord.out.bam",
                    sample=SAMPLES),
        gtf=GTF
    output:
        counts="results/featurecounts/gene_counts.txt",
        summary="results/featurecounts/gene_counts.txt.summary"
    threads: 8
    conda: "envs/quantification.yaml"
    shell:
        """
        featureCounts -T {threads} -p --countReadPairs \
          -a {input.gtf} -o {output.counts} \
          -s 2 --primary \
          {input.bams}
        """

rule deseq2:
    input:
        counts="results/featurecounts/gene_counts.txt",
        metadata=config["sample_metadata"]
    output:
        results="results/deseq2/differential_expression.csv",
        norm_counts="results/deseq2/normalised_counts.csv",
        pca="results/deseq2/pca_plot.pdf",
        ma="results/deseq2/ma_plot.pdf",
        volcano="results/deseq2/volcano_plot.pdf"
    conda: "envs/deseq2.yaml"
    script:
        "scripts/run_deseq2.R"

rule multiqc:
    input:
        expand("results/trimmed/{sample}_fastp.json", sample=SAMPLES),
        expand("results/star/{sample}/{sample}.Log.final.out",
               sample=SAMPLES),
        "results/featurecounts/gene_counts.txt.summary"
    output:
        "results/multiqc/multiqc_report.html"
    conda: "envs/qc.yaml"
    shell:
        "multiqc results/ -o results/multiqc/ --force"

See Also