kallisto

Overview

kallisto is an ultrafast RNA-seq quantification tool that uses pseudoalignment to estimate transcript-level abundances without performing traditional read alignment. Instead of mapping reads to a reference genome, kallisto determines which transcripts each read is compatible with using a transcriptome de Bruijn graph index. This approach achieves quantification speeds orders of magnitude faster than alignment-based methods while maintaining comparable accuracy. kallisto also supports bootstrap resampling to estimate technical variance in abundance estimates, which can be leveraged by downstream tools such as sleuth for differential expression analysis.

Installation

mamba install -c bioconda kallisto

Basic Usage

Build an index from a transcriptome FASTA

kallisto index -i transcripts.idx transcriptome.fa

Quantify paired-end reads with bootstraps

kallisto quant -i transcripts.idx -o output/ \
  -b 100 -t 8 \
  sample_R1.fastq.gz sample_R2.fastq.gz

For single-end reads, provide the estimated fragment length and standard deviation:

kallisto quant -i transcripts.idx -o output/ \
  --single -l 200 -s 30 -t 8 \
  sample.fastq.gz

Key Parameters

Flag / option	Description
`index -i`	Output path for the kallisto index file.
`quant -i`	Path to the pre-built kallisto index.
`-o`	Output directory for quantification results.
`-b`	Number of bootstrap samples for estimating technical variance.
`-t`	Number of threads to use.
`--single`	Enable single-end read mode (requires `-l` and `-s`).
`-l`	Estimated average fragment length (for single-end reads).
`-s`	Estimated standard deviation of fragment length (for single-end reads).
`--rf-stranded`	Reads are from a reverse-stranded library (e.g. dUTP method).
`--fr-stranded`	Reads are from a forward-stranded library.
`--plaintext`	Write output in plain text instead of HDF5 format.

Expected Output

kallisto writes the following files to the output directory:

abundance.tsv – a tab-delimited file with columns for target ID, transcript length, effective length, estimated counts (est_counts), and transcripts per million (TPM).
abundance.h5 – an HDF5 file containing the abundance estimates and bootstrap replicates (when -b is used), readable by sleuth and other downstream tools.
run_info.json – a JSON file recording the kallisto version, index used, number of processed reads, percentage of pseudoaligned reads, and run parameters.