Nextflow ======== Overview -------- Nextflow is a reactive workflow framework built on the dataflow programming model. In its current DSL2 syntax, pipelines are composed of modular processes connected through channels. Each process runs in its own isolated environment, with built-in support for Docker, Singularity, Conda, and cloud executors (AWS Batch, Google Life Sciences, Azure Batch). Nextflow handles job scheduling, fault tolerance with automatic retries, and seamless resumption of incomplete runs via its caching mechanism. The nf-core community maintains a large collection of peer-reviewed, production-ready pipelines for common bioinformatics analyses. Pipeline Steps -------------- A typical Nextflow QC pipeline follows these steps: 1. Create a channel of paired-end read files using ``fromFilePairs``. 2. Run FastQC on each sample to generate quality reports. 3. Trim adapters and low-quality bases with fastp. 4. Collect all QC outputs and aggregate them into a MultiQC report. QC Pipeline ----------- The pipeline below implements the complete quality-control workflow in Nextflow DSL2. It reads paired FASTQ files from a glob pattern, runs FastQC and fastp in parallel per sample, then aggregates all reports with MultiQC. .. code-block:: groovy #!/usr/bin/env nextflow nextflow.enable.dsl = 2 params.reads = "data/*_{R1,R2}.fastq.gz" params.outdir = "results" params.min_qual = 20 params.min_len = 50 process FASTQC { tag "${sample_id}" publishDir "${params.outdir}/fastqc", mode: 'copy' conda 'bioconda::fastqc=0.12.1' cpus 2 input: tuple val(sample_id), path(reads) output: path("*.html"), emit: html path("*.zip"), emit: zip script: """ fastqc -t ${task.cpus} ${reads} """ } process FASTP { tag "${sample_id}" publishDir "${params.outdir}/fastp", mode: 'copy' conda 'bioconda::fastp=0.23.4' cpus 4 input: tuple val(sample_id), path(reads) output: tuple val(sample_id), path("*_trimmed.fastq.gz"), emit: trimmed path("*.html"), emit: html path("*.json"), emit: json script: def (r1, r2) = reads """ fastp -i ${r1} -I ${r2} \ -o ${sample_id}_R1_trimmed.fastq.gz \ -O ${sample_id}_R2_trimmed.fastq.gz \ -h ${sample_id}_fastp.html \ -j ${sample_id}_fastp.json \ --qualified_quality_phred ${params.min_qual} \ --length_required ${params.min_len} \ --detect_adapter_for_pe \ --thread ${task.cpus} """ } process MULTIQC { publishDir "${params.outdir}/multiqc", mode: 'copy' conda 'bioconda::multiqc=1.21' input: path('*') output: path("multiqc_report.html") script: """ multiqc . -o . --force """ } workflow { read_pairs_ch = Channel .fromFilePairs(params.reads, checkIfExists: true) FASTQC(read_pairs_ch) FASTP(read_pairs_ch) ch_multiqc = FASTQC.out.zip .mix(FASTP.out.json) .collect() MULTIQC(ch_multiqc) } Configuration ------------- Nextflow uses a ``nextflow.config`` file to define execution profiles. The configuration below sets up local, SLURM, Docker, and Singularity profiles: .. code-block:: groovy profiles { standard { process.executor = 'local' process.cpus = 4 process.memory = '8 GB' } slurm { process.executor = 'slurm' process.queue = 'standard' process.cpus = 4 process.memory = '8 GB' process.time = '2h' } docker { docker.enabled = true } singularity { singularity.enabled = true singularity.autoMounts = true } } process { errorStrategy = 'retry' maxRetries = 2 } Running the Pipeline -------------------- .. code-block:: bash nextflow run main.nf --reads "data/*_{R1,R2}.fastq.gz" nextflow run main.nf -profile slurm nextflow run main.nf -profile docker nextflow run main.nf -resume nextflow run nf-core/rnaseq -r 3.14.0 \ --input samplesheet.csv --genome GRCh38 -profile singularity See Also -------- * :doc:`snakemake` -- Snakemake workflow manager * :doc:`wgs-variant-calling` -- WGS variant calling pipeline * :doc:`rnaseq-differential-expression` -- RNA-seq differential expression pipeline