Chopper
Overview
Chopper is a fast, Rust-based filtering and trimming tool for long-read sequencing data. It reads from standard input and writes to standard output, making it easy to integrate into Unix pipelines. Chopper filters reads by minimum quality score and minimum/maximum length, and can optionally trim a fixed number of bases from read heads or tails. It is commonly used after basecalling to remove low-quality or very short nanopore reads before alignment.
Installation
mamba install -c bioconda chopper
Basic Usage
Filter nanopore reads from a BAM file, keeping only reads with a minimum average quality of 10 and a minimum length of 1000 bp.
# Quality and length filtering for nanopore reads
samtools fastq reads.bam | \
chopper --quality 10 --minlength 1000 | \
gzip > filtered_reads.fastq.gz
# Check before/after
echo "Before: $(samtools view -c reads.bam) reads"
echo "After: $(zcat filtered_reads.fastq.gz | awk 'NR%4==1' | wc -l) reads"
Key Parameters
Flag / option |
Description |
|---|---|
|
Minimum average Phred quality score for a read to pass (default 0). |
|
Discard reads shorter than this value (default 1). |
|
Discard reads longer than this value (default unlimited). |
|
Trim this many bases from the start of each read. |
|
Trim this many bases from the end of each read. |
|
Number of threads for compression/decompression. |
|
Path to a FASTA file of contaminant sequences to filter against. |
Expected Output
Chopper writes filtered FASTQ records to standard output. In the example
above, the output is piped through gzip to produce a compressed FASTQ
file (filtered_reads.fastq.gz).
There is no separate report file. Use the before/after read-count check shown above or run NanoPlot on the filtered reads to visualise the effect of filtering.