Unicycler
Overview
Unicycler is a hybrid assembler specifically designed for bacterial genomes. It combines short Illumina reads with long reads from Oxford Nanopore or PacBio to produce complete, circularised chromosomes and plasmids. Unicycler first builds a short-read assembly graph using SPAdes, then uses long reads to resolve repeats and bridge gaps in the graph. It can also operate in short-read-only mode, where it functions as an optimised SPAdes pipeline with additional graph-cleaning heuristics. Unicycler assigns quality scores to completed replicons, making it straightforward to assess which contigs are fully resolved.
Installation
mamba install -c bioconda unicycler
Basic Usage
Perform a hybrid assembly using paired-end short reads and long reads.
# Hybrid assembly
unicycler -1 short_R1.fastq.gz -2 short_R2.fastq.gz \
-l long_reads.fastq.gz \
-o unicycler_output/ -t 8
Assemble a bacterial genome from short reads only.
# Short-read only
unicycler -1 short_R1.fastq.gz -2 short_R2.fastq.gz \
-o unicycler_output/ -t 8
Key Parameters
Flag / option |
Description |
|---|---|
|
Input paired-end short-read FASTQ files (read 1 and read 2). |
|
Input long reads (Nanopore or PacBio) for hybrid assembly. |
|
Output directory for all assembly results. |
|
Number of CPU threads to use. |
|
Assembly stringency: |
|
Minimum contig length to include in the final output (default 100). |
|
Expected number of linear sequences (default 0, assumes circular chromosomes and plasmids). |
|
Do not rotate completed circular sequences to start at a standard gene (e.g. dnaA). |
Expected Output
Unicycler writes output to the specified directory:
assembly.fasta– the final assembly containing both complete and incomplete replicons in FASTA format.assembly.gfa– the final assembly graph in GFA format, viewable in Bandage.unicycler.log– a detailed log file recording every step of the assembly pipeline including SPAdes k-mer iterations, graph cleaning, and long-read bridging.
Each contig header in the FASTA output includes length, depth (coverage), and circularity status, allowing rapid identification of complete replicons.