Picard

Overview

Picard is a set of Java command-line tools maintained by the Broad Institute for manipulating high-throughput sequencing data. It is best known for marking PCR duplicates and collecting a wide range of alignment quality metrics. Picard integrates tightly with GATK-based variant-calling pipelines and produces detailed metrics files that can be visualised with tools such as MultiQC.

Installation

mamba install -c bioconda picard

Basic Usage

Mark PCR duplicates

java -jar picard.jar MarkDuplicates \
  I=sample.sorted.bam \
  O=sample.dedup.bam \
  M=sample.dup_metrics.txt \
  REMOVE_DUPLICATES=false

Collect alignment summary metrics

java -jar picard.jar CollectAlignmentSummaryMetrics \
  R=reference.fa I=sample.dedup.bam O=alignment_metrics.txt

Collect insert size distribution

java -jar picard.jar CollectInsertSizeMetrics \
  I=sample.dedup.bam O=insert_metrics.txt H=insert_hist.pdf

Key Parameters

Flag / option

Description

I

Input BAM or SAM file.

O

Output BAM or metrics file.

M

Duplication metrics output file (used with MarkDuplicates).

R

Reference FASTA file (required by several collectors).

REMOVE_DUPLICATES

If true, discard duplicate reads from the output BAM instead of only flagging them (default: false).

H

Histogram PDF output file (used with CollectInsertSizeMetrics).

VALIDATION_STRINGENCY

How strictly to validate the input (STRICT, LENIENT, or SILENT). Set to LENIENT or SILENT to skip non-critical format warnings.

CREATE_INDEX

If true, create a BAM index file alongside the output BAM (default: false).

Expected Output

  • MarkDuplicates – a BAM file with duplicate FLAG bits set (or removed) and a tab-delimited metrics file reporting the number and fraction of duplicates per library.

  • CollectAlignmentSummaryMetrics – a text file with statistics for each read category (paired, unpaired, first-of-pair, second-of-pair) including total reads, aligned reads, mismatch rate, and more.

  • CollectInsertSizeMetrics – a text metrics file and an accompanying PDF histogram showing the distribution of insert sizes for paired-end libraries.

See Also

  • SAMtools – lightweight alternative for basic BAM operations such as sorting, indexing, and simple filtering

  • Sambamba – faster multi-threaded duplicate marking

  • deepTools – generate coverage tracks and quality heatmaps from processed BAM files

  • SAM / BAM / CRAM – reference for the SAM/BAM/CRAM file formats