kallisto
Overview
kallisto is an ultrafast RNA-seq quantification tool that uses pseudoalignment to estimate transcript-level abundances without performing traditional read alignment. Instead of mapping reads to a reference genome, kallisto determines which transcripts each read is compatible with using a transcriptome de Bruijn graph index. This approach achieves quantification speeds orders of magnitude faster than alignment-based methods while maintaining comparable accuracy. kallisto also supports bootstrap resampling to estimate technical variance in abundance estimates, which can be leveraged by downstream tools such as sleuth for differential expression analysis.
Installation
mamba install -c bioconda kallisto
Basic Usage
Build an index from a transcriptome FASTA
kallisto index -i transcripts.idx transcriptome.fa
Quantify paired-end reads with bootstraps
kallisto quant -i transcripts.idx -o output/ \
-b 100 -t 8 \
sample_R1.fastq.gz sample_R2.fastq.gz
For single-end reads, provide the estimated fragment length and standard deviation:
kallisto quant -i transcripts.idx -o output/ \
--single -l 200 -s 30 -t 8 \
sample.fastq.gz
Key Parameters
Flag / option |
Description |
|---|---|
|
Output path for the kallisto index file. |
|
Path to the pre-built kallisto index. |
|
Output directory for quantification results. |
|
Number of bootstrap samples for estimating technical variance. |
|
Number of threads to use. |
|
Enable single-end read mode (requires |
|
Estimated average fragment length (for single-end reads). |
|
Estimated standard deviation of fragment length (for single-end reads). |
|
Reads are from a reverse-stranded library (e.g. dUTP method). |
|
Reads are from a forward-stranded library. |
|
Write output in plain text instead of HDF5 format. |
Expected Output
kallisto writes the following files to the output directory:
abundance.tsv– a tab-delimited file with columns for target ID, transcript length, effective length, estimated counts (est_counts), and transcripts per million (TPM).abundance.h5– an HDF5 file containing the abundance estimates and bootstrap replicates (when-bis used), readable by sleuth and other downstream tools.run_info.json– a JSON file recording the kallisto version, index used, number of processed reads, percentage of pseudoaligned reads, and run parameters.
See Also
Salmon – quasi-mapping-based quantification tool with built-in GC and sequence bias correction
featureCounts – alignment-based gene-level counting from BAM files
DESeq2 – differential expression analysis using count data from kallisto (via tximport)