HTSeq
Overview
HTSeq is a Python framework for working with high-throughput sequencing data that includes htseq-count, a widely used tool for counting reads mapped to genomic features. Given a sorted BAM file and a GTF annotation, htseq-count assigns each read (or read pair) to a gene based on its overlap with annotated exons. It provides multiple overlap resolution modes to handle reads that span feature boundaries or overlap multiple genes. HTSeq produces a simple gene-by-count table that serves as direct input for differential expression tools such as DESeq2 and edgeR.
Installation
mamba install -c bioconda htseq
Basic Usage
Count reads per gene from a coordinate-sorted BAM file.
htseq-count -f bam -r pos -s reverse \
-t exon -i gene_id \
sample.sorted.bam genes.gtf > counts.txt
For multiple samples, run htseq-count separately on each BAM file and merge the results into a count matrix.
Key Parameters
Flag / option |
Description |
|---|---|
|
Input format: |
|
Sort order of the input file: |
|
Strand-specificity: |
|
Feature type to use from the GTF (default |
|
GTF attribute to use as the feature ID (default |
|
Overlap resolution mode: |
|
How to handle reads mapping to multiple features: |
|
Minimum alignment quality threshold (default 10). |
|
Include additional GTF attributes in the output (e.g. gene_name). |
Expected Output
Standard output (redirected to
counts.txt) – a two-column tab-delimited file with the gene identifier in the first column and the raw read count in the second column. The last five lines contain special counters:__no_feature– reads not overlapping any feature.__ambiguous– reads overlapping multiple features.__too_low_aQual– reads below the alignment quality threshold.__not_aligned– unmapped reads.__alignment_not_unique– reads with multiple alignments.
See Also
featureCounts – faster multi-threaded alternative for read counting with built-in multi-BAM support
Salmon – alignment-free transcript-level quantification
kallisto – pseudoalignment-based transcript quantification
DESeq2 – differential expression analysis using htseq-count output
edgeR – alternative differential expression framework