Sambamba
Overview
Sambamba is a high-performance tool for processing BAM and CRAM files, designed as a drop-in complement to SAMtools with substantially better multi-threading support. Its most common uses are duplicate marking, sorting, and filtering. By distributing work across many CPU cores, Sambamba can significantly reduce wall-clock time for routine BAM processing steps, making it particularly attractive for large whole-genome sequencing datasets.
Installation
mamba install -c bioconda sambamba
Basic Usage
Mark PCR / optical duplicates
sambamba markdup -t 8 sample.sorted.bam sample.dedup.bam
Sort a BAM file by coordinate
sambamba sort -t 8 -o sample.sorted.bam sample.bam
Filter reads using an expression
sambamba view -t 8 -f bam -F "mapping_quality >= 30 and not duplicate" \
sample.bam > filtered.bam
Key Parameters
Flag / option |
Description |
|---|---|
|
Number of threads to use for processing. |
|
Write output to FILE. |
|
Output format: |
|
A filter expression for selecting reads (e.g.,
|
|
Directory for temporary files during sorting (defaults to the current directory). |
|
Size of the overflow hash table list used during duplicate marking; tune for very large files. |
|
Compression level for BAM output (0–9). |
Expected Output
sambamba markdup– a BAM file with the duplicate FLAG bit (0x400) set on identified PCR or optical duplicates. Duplicates are retained by default and can be filtered later.sambamba sort– a coordinate-sorted (or name-sorted with-n) BAM file.sambamba view– filtered reads written to a BAM, SAM, or JSON file depending on the-foption.
See Also
SAMtools – the reference toolkit for SAM/BAM manipulation
Picard – alternative duplicate marking with detailed duplicate metrics output
SAM / BAM / CRAM – reference for the SAM/BAM/CRAM file formats