Canu
Overview
Canu is a de novo long-read assembler derived from the Celera Assembler. It provides an integrated pipeline that corrects, trims, and assembles PacBio or Oxford Nanopore reads in a single workflow. Canu is designed to handle noisy long reads and uses adaptive k-mer weighting and three-stage processing (correction, trimming, assembly) to produce high-quality contigs. It works well for bacterial genomes and moderately sized eukaryotic genomes, and it can operate on compute clusters via built-in grid engine support.
Installation
mamba install -c bioconda canu
Basic Usage
Assemble a bacterial genome from Nanopore reads, specifying a prefix for output files and the estimated genome size.
canu -p ecoli -d canu_output/ \
genomeSize=4.6m \
-nanopore reads.fastq.gz \
maxThreads=8
Key Parameters
Flag / option |
Description |
|---|---|
|
Prefix for output file names (e.g. |
|
Directory where all output files will be written. |
|
Estimated genome size (e.g. |
|
Input Oxford Nanopore reads. |
|
Input PacBio HiFi / CCS reads. |
|
Maximum number of CPU threads to use. |
|
Expected error rate after correction (lower values produce more stringent overlap filtering). |
|
Minimum read length to use in the assembly (default 1000). |
|
Minimum coverage threshold below which Canu will stop and warn. |
Expected Output
Canu writes output to the specified directory using the prefix from -p:
ecoli.contigs.fasta– the final assembled contigs in FASTA format.ecoli.report– a summary report with assembly statistics including read correction rates, overlap counts, and contig metrics.ecoli.unassembled.fasta– reads that could not be placed into contigs.ecoli.contigs.gfa– the assembly graph in GFA format.ecoli.contigs.layout.tigInfo– detailed information about each contig including length and coverage.