Bismark

Overview

Bismark is a dedicated aligner and methylation caller for bisulfite-treated sequencing (BS-seq) data. Bisulfite treatment converts unmethylated cytosines to uracils (read as thymines after PCR), while methylated cytosines remain unchanged. Bismark handles the resulting sequence complexity by aligning reads against an in-silico bisulfite-converted reference genome using either Bowtie2 or HISAT2 as the underlying aligner. After alignment, the methylation extractor determines the methylation state of every cytosine in CpG, CHG, and CHH contexts, producing per-base methylation calls for downstream analysis.

Installation

mamba install -c bioconda bismark

Bismark requires Bowtie2 (installed as a dependency) and Samtools.

Basic Usage

Step 1 – Prepare the genome

Build the bisulfite-converted genome index (one-time step per reference).

bismark_genome_preparation /ref/

Step 2 – Align bisulfite-treated reads

bismark --genome /ref/ \
  -1 trimmed_R1.fastq.gz -2 trimmed_R2.fastq.gz \
  --parallel 4 -o bismark_output/

Step 3 – Remove duplicates

deduplicate_bismark --bam bismark_output/sample.bam

Step 4 – Extract methylation calls

bismark_methylation_extractor --paired-end --comprehensive \
  --CX --cytosine_report --genome_folder /ref/ \
  -o methylation/ bismark_output/sample.deduplicated.bam

Key Parameters

Flag / option

Description

--genome

Path to the directory containing the Bismark-prepared reference genome.

-1 / -2

Paired-end input FASTQ files (Read 1 and Read 2).

--parallel

Number of parallel Bismark instances to run (each uses multiple Bowtie2 threads).

-o

Output directory for alignment or extraction results.

--non_directional

Align to all four possible bisulfite-converted strands (required for some library protocols such as PBAT).

--paired-end

Input data is paired-end (used in the methylation extractor).

--comprehensive

Merge context-specific methylation output into a single comprehensive file.

--CX

Report methylation for all cytosine contexts (CpG, CHG, CHH), not just CpG.

--cytosine_report

Generate a genome-wide cytosine methylation report with coverage information for every cytosine position.

--genome_folder

Path to the reference genome folder (used by the methylation extractor for generating the cytosine report).

Expected Output

Alignment step:

  • sample_bismark_bt2_pe.bam – BAM file with aligned reads and methylation call tags (XM tag).

  • sample_bismark_bt2_PE_report.txt – alignment summary with mapping efficiency and cytosine methylation percentages.

Deduplication step:

  • sample.deduplicated.bam – BAM file with PCR duplicates removed.

  • Deduplication report with the number and percentage of duplicates removed.

Methylation extraction step:

  • CpG_context_sample.deduplicated.txt – per-read methylation calls for CpG sites.

  • sample.deduplicated.CX_report.txt – genome-wide cytosine report with columns for chromosome, position, strand, methylated count, unmethylated count, context, and trinucleotide context.

  • sample.deduplicated.bedGraph.gz – bedGraph file of CpG methylation percentages.

  • sample.deduplicated.bismark.cov.gz – coverage file with methylation percentage and read counts per CpG.

See Also

  • MACS2 – peak calling for ChIP-seq and ATAC-seq epigenomic assays

  • fastp – adapter trimming and quality filtering of bisulfite-seq reads before alignment

  • Picard – additional duplicate marking and BAM processing utilities