BRAKER
Overview
BRAKER is a gene prediction pipeline for eukaryotic genomes that combines GeneMark-ES/ET/EP with AUGUSTUS to produce accurate gene structure annotations. It can use RNA-seq alignments, protein homology evidence, or both to train and refine ab initio gene models. BRAKER automates the training of species-specific parameters for AUGUSTUS, making it accessible for newly sequenced organisms without existing gene models. It is one of the most widely used tools for eukaryotic genome annotation and handles intron-exon boundary prediction, alternative splicing hints, and UTR annotation.
Installation
mamba install -c bioconda braker3
Basic Usage
Run BRAKER with RNA-seq evidence provided as a BAM alignment file.
# BRAKER with RNA-seq evidence
braker.pl --genome=genome.fasta \
--bam=rnaseq_aligned.bam \
--species=myspecies \
--softmasking \
--cores 8
Key Parameters
Flag / option |
Description |
|---|---|
|
Input genome assembly in FASTA format. |
|
RNA-seq reads aligned to the genome in BAM format (for evidence-based training). |
|
Protein sequences in FASTA format for homology-based gene finding. |
|
Species name used for AUGUSTUS parameter training and storage. |
|
Indicate that the genome has been soft-masked (lowercase letters for repeats). BRAKER will respect the masking during gene prediction. |
|
Number of CPU threads to use. |
|
Output gene predictions in GFF3 format (default is GTF). |
|
Enable UTR prediction (requires RNA-seq evidence). |
|
Use fungal-specific parameters for GeneMark and AUGUSTUS. |
|
Also produce ab initio predictions (without evidence support) in addition to the evidence-based predictions. |
Expected Output
BRAKER writes output to the braker/ directory (or the directory specified
with --workingdir):
braker.gtf– the final gene predictions in GTF format, containing gene, transcript, exon, CDS, and optionally UTR features.braker.gff3– gene predictions in GFF3 format (when--gff3is used).braker.aa– predicted protein sequences in FASTA format.braker.codingseq– coding nucleotide sequences in FASTA format.hintsfile.gff– the compiled hints file used by AUGUSTUS, derived from RNA-seq and/or protein evidence.augustus.hints.gtf– AUGUSTUS predictions using the trained parameters and external hints.