minimap2

Overview

minimap2 is a versatile sequence alignment tool designed for mapping long noisy reads (Oxford Nanopore, PacBio), short reads, and spliced reads (RNA-seq) to a reference genome. It supports a wide range of presets optimised for different data types (map-ont, map-hifi, map-pb, sr, splice) and is substantially faster than its predecessor minimap while maintaining high accuracy. minimap2 is the de facto standard aligner for long-read sequencing data.

Installation

mamba install -c bioconda minimap2

Basic Usage

Download a reference, index it, align nanopore reads, and produce a sorted BAM file.

# Download and index reference
wget -q https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/005/845/GCA_000005845.2_ASM584v2/GCA_000005845.2_ASM584v2_genomic.fna.gz
gunzip GCA_000005845.2_ASM584v2_genomic.fna.gz
samtools faidx reference.fna

# Align nanopore reads
minimap2 -ax map-ont --MD \
  -R '@RG\tID:sample1\tSM:sample1\tPL:ONT' \
  reference.fna reads.fastq.gz \
  | samtools sort -@ 4 -o aligned.sorted.bam -

samtools index aligned.sorted.bam

Key Parameters

Flag / option

Description

-a

Output in SAM format (required when piping to samtools).

-x

Preset for the data type. Common values:

  • map-ont – Oxford Nanopore reads

  • map-hifi – PacBio HiFi reads

  • map-pb – PacBio CLR reads

  • sr – short (Illumina) reads

  • splice – long-read RNA-seq

--MD

Include the MD tag for reference base information (needed by some variant callers).

-R

Read group header line (e.g., '@RG\tID:sample1\tSM:sample1\tPL:ONT').

-t

Number of alignment threads (default 3).

-d

Save the index to a file for reuse with large genomes.

-k

Minimizer k-mer length (preset-dependent).

-w

Minimizer window size (preset-dependent).

--secondary=no

Do not output secondary alignments.

-Y

Use soft clipping for supplementary alignments.

Expected Output

The pipeline above produces:

  • aligned.sorted.bam – a coordinate-sorted BAM file with all aligned and unaligned reads.

  • aligned.sorted.bam.bai – the BAM index for fast random access.

Verify alignment statistics with:

samtools flagstat aligned.sorted.bam

This reports total reads, mapped reads, primary and supplementary alignments, and properly paired reads (if applicable).

See Also

  • BWA-MEM2 – optimised short-read aligner for Illumina data

  • NanoPlot – visualise read quality and length distributions for long-read data

  • Chopper – quality and length filtering before alignment

  • FASTQ – reference for the FASTQ file format

  • SAM / BAM / CRAM – reference for the SAM/BAM/CRAM alignment format