featureCounts ============= Overview -------- featureCounts is a fast and efficient read counting program from the Subread package that assigns aligned reads (or read pairs) to genomic features such as genes, exons, or promoters using a gene annotation file in GTF or SAF format. It supports multi-threaded execution and can process multiple BAM files in a single run, producing a unified count matrix suitable for downstream differential expression analysis. featureCounts handles both single-end and paired-end data, supports strand-specific counting, and provides summary statistics on assignment success rates. Installation ------------ .. code-block:: bash mamba install -c bioconda subread Basic Usage ----------- Count read pairs for each gene across multiple BAM files. .. code-block:: bash featureCounts -T 8 -p --countReadPairs \ -a genes.gtf \ -o counts.txt \ sample1.bam sample2.bam sample3.bam For single-end data, omit the ``-p`` and ``--countReadPairs`` flags: .. code-block:: bash featureCounts -T 8 \ -a genes.gtf \ -o counts.txt \ sample1.bam sample2.bam Key Parameters -------------- .. list-table:: :header-rows: 1 :widths: 25 75 * - Flag / option - Description * - ``-a`` - Path to the gene annotation file in GTF or SAF format. * - ``-o`` - Output file path for the count matrix. * - ``-T`` - Number of threads to use. * - ``-p`` - Indicate that input data is paired-end. * - ``--countReadPairs`` - Count fragments (read pairs) rather than individual reads. * - ``-s`` - Strand-specificity: ``0`` for unstranded, ``1`` for forward stranded, ``2`` for reverse stranded. * - ``-t`` - Feature type to count (default ``exon``); must match the third column of the GTF. * - ``-g`` - Attribute used for grouping features into meta-features (default ``gene_id``). * - ``-Q`` - Minimum mapping quality threshold for counting a read. * - ``--primary`` - Count only primary alignments (ignore secondary and supplementary). * - ``-B`` - Require both ends of a read pair to be aligned for counting. Expected Output --------------- * ``counts.txt`` -- a tab-delimited count matrix with gene metadata columns (Geneid, Chr, Start, End, Strand, Length) followed by one count column per input BAM file. Each row represents a gene (or meta-feature) and each count value is the number of reads or fragments assigned to that gene. * ``counts.txt.summary`` -- a summary table showing the number of reads assigned, unassigned (ambiguous, multi-mapping, no features, unmapped), and other categories for each input file. See Also -------- * :doc:`htseq` -- alternative read counting tool with per-read overlap resolution modes * :doc:`salmon` -- alignment-free quantification at the transcript level * :doc:`kallisto` -- pseudoalignment-based transcript quantification * :doc:`/tools/differential-expression/deseq2` -- differential expression analysis using featureCounts output