Salmon
======

Overview
--------

Salmon is a fast and bias-aware RNA-seq quantification tool that estimates
transcript-level abundances using selective alignment (mapping-based mode) or
quasi-mapping. It accounts for common biases in RNA-seq data including fragment
GC content bias, positional bias, and sequence-specific bias through its
built-in correction models. Salmon can also operate in alignment-based mode,
taking a pre-aligned BAM file as input. Its speed and accuracy make it a
standard choice in RNA-seq pipelines, and its output integrates directly with
tximport for gene-level summarisation and downstream differential expression
analysis with DESeq2 or edgeR.

Installation
------------

.. code-block:: bash

   mamba install -c bioconda salmon

Basic Usage
-----------

**Build a transcriptome index**

.. code-block:: bash

   salmon index -t transcriptome.fa -i salmon_index -p 8

**Quantify paired-end reads with selective alignment**

.. code-block:: bash

   salmon quant -i salmon_index -l A \
     -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz \
     -p 8 --validateMappings \
     -o salmon_output/

The ``-l A`` flag enables automatic library type detection.

Key Parameters
--------------

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Flag / option
     - Description
   * - ``index -t``
     - Path to the transcriptome FASTA for building the index.
   * - ``index -i``
     - Output path for the Salmon index directory.
   * - ``quant -i``
     - Path to the pre-built Salmon index.
   * - ``-l``
     - Library type (``A`` for automatic detection, or explicit types such as
       ``ISR``, ``ISF``, ``IU``).
   * - ``-1`` / ``-2``
     - Paired-end read files (forward and reverse).
   * - ``-r``
     - Single-end read file.
   * - ``-p``
     - Number of threads to use.
   * - ``--validateMappings``
     - Enable selective alignment for improved mapping accuracy.
   * - ``-o``
     - Output directory for quantification results.
   * - ``--gcBias``
     - Enable GC bias correction (recommended for most datasets).
   * - ``--seqBias``
     - Enable sequence-specific bias correction.
   * - ``--numBootstraps``
     - Number of bootstrap samples for variance estimation.

Expected Output
---------------

Salmon writes the following files to the output directory:

* ``quant.sf`` -- the primary output file: a tab-delimited table with columns
  for transcript name, length, effective length, TPM (transcripts per million),
  and estimated read counts (NumReads).
* ``quant.genes.sf`` -- gene-level quantification (when a gene map is
  provided).
* ``aux_info/`` -- directory containing auxiliary information including the
  observed library type, fragment length distribution, bias correction
  parameters, and the equivalence class file.
* ``cmd_info.json`` -- a JSON file recording the exact command and parameters
  used for the run.
* ``logs/`` -- directory containing log files with mapping rate and run
  statistics.

See Also
--------

* :doc:`kallisto` -- pseudoalignment-based quantification tool with bootstrap
  support
* :doc:`featurecounts` -- alignment-based gene-level counting from BAM files
* :doc:`/tools/differential-expression/deseq2` -- differential expression
  analysis using Salmon counts (via tximport)
* :doc:`/tools/differential-expression/edger` -- alternative differential
  expression framework compatible with Salmon output