Unicycler

Overview

Unicycler is a hybrid assembler specifically designed for bacterial genomes. It combines short Illumina reads with long reads from Oxford Nanopore or PacBio to produce complete, circularised chromosomes and plasmids. Unicycler first builds a short-read assembly graph using SPAdes, then uses long reads to resolve repeats and bridge gaps in the graph. It can also operate in short-read-only mode, where it functions as an optimised SPAdes pipeline with additional graph-cleaning heuristics. Unicycler assigns quality scores to completed replicons, making it straightforward to assess which contigs are fully resolved.

Installation

mamba install -c bioconda unicycler

Basic Usage

Perform a hybrid assembly using paired-end short reads and long reads.

# Hybrid assembly
unicycler -1 short_R1.fastq.gz -2 short_R2.fastq.gz \
  -l long_reads.fastq.gz \
  -o unicycler_output/ -t 8

Assemble a bacterial genome from short reads only.

# Short-read only
unicycler -1 short_R1.fastq.gz -2 short_R2.fastq.gz \
  -o unicycler_output/ -t 8

Key Parameters

Flag / option

Description

-1 / -2

Input paired-end short-read FASTQ files (read 1 and read 2).

-l

Input long reads (Nanopore or PacBio) for hybrid assembly.

-o

Output directory for all assembly results.

-t

Number of CPU threads to use.

--mode

Assembly stringency: conservative, normal (default), or bold. Bold mode resolves more repeats but may introduce errors.

--min_fasta_length

Minimum contig length to include in the final output (default 100).

--linear_seqs

Expected number of linear sequences (default 0, assumes circular chromosomes and plasmids).

--no_rotate

Do not rotate completed circular sequences to start at a standard gene (e.g. dnaA).

Expected Output

Unicycler writes output to the specified directory:

  • assembly.fasta – the final assembly containing both complete and incomplete replicons in FASTA format.

  • assembly.gfa – the final assembly graph in GFA format, viewable in Bandage.

  • unicycler.log – a detailed log file recording every step of the assembly pipeline including SPAdes k-mer iterations, graph cleaning, and long-read bridging.

Each contig header in the FASTA output includes length, depth (coverage), and circularity status, allowing rapid identification of complete replicons.

See Also

  • SPAdes – the underlying short-read assembler used by Unicycler

  • Flye – long-read-only assembler as an alternative for Nanopore or PacBio data

  • BUSCO – assess gene-level completeness of the assembly