Computing Environment
Overview
Most bioinformatics work runs on Linux, either on a local workstation or a shared High-Performance Computing (HPC) cluster. This page covers the essential Linux commands you will use daily when processing sequencing data and shows how to submit jobs with the SLURM workload manager.
Installation
No additional installation is needed – the commands below use standard Linux utilities together with bioinformatics tools installed through Conda (see Installation).
Basic Usage
Essential Linux one-liners
# Count reads in a FASTQ file (4 lines per read)
wc -l sample.fastq | awk '{print $1/4}'
# Extract chromosome 1 alignments from a BAM file
samtools view -b input.bam chr1 > chr1.bam
# Sort a BED file and merge overlapping intervals
sort -k1,1 -k2,2n peaks.bed | bedtools merge > merged.bed
# View the first few records of a gzipped VCF
zcat variants.vcf.gz | grep -v "^#" | head -20
# Monitor disk usage of a project directory
du -sh /data/project/*
Submitting a SLURM job
Write a SLURM batch script to run a BWA-MEM2 alignment on a cluster.
#!/bin/bash
#SBATCH --job-name=bwa_align
#SBATCH --output=logs/bwa_%j.out
#SBATCH --error=logs/bwa_%j.err
#SBATCH --cpus-per-task=16
#SBATCH --mem=32G
#SBATCH --time=06:00:00
#SBATCH --partition=standard
source activate wgs_pipeline
bwa-mem2 mem -t 16 \
/ref/GRCh38.fa \
sample_R1.fastq.gz sample_R2.fastq.gz \
| samtools sort -@ 4 -o sample.sorted.bam -
samtools index sample.sorted.bam
Save this as align.sh and submit it:
sbatch align.sh
Key Parameters
SLURM directives
Directive |
Description |
|---|---|
|
Human-readable name shown by |
|
Number of CPU cores allocated to the job. |
|
Total memory for the job (e.g. |
|
Maximum wall-clock time ( |
|
Cluster partition / queue to submit to. |
|
Paths for stdout / stderr logs ( |
Useful SLURM commands
Command |
Description |
|---|---|
|
Submit a batch job. |
|
List your running and pending jobs. |
|
Cancel a job. |
|
Check elapsed time and peak memory after a job finishes. |
Expected Output
After sbatch align.sh succeeds you will find:
logs/bwa_<jobid>.out– stdout from BWA-MEM2 and samtools.logs/bwa_<jobid>.err– stderr, including timing information.sample.sorted.bam– coordinate-sorted BAM file.sample.sorted.bam.bai– BAM index.
See Also
Installation – install Miniforge and create Conda environments
Test Datasets – download test data to run on your cluster
Singularity / Apptainer – run containers on HPC systems where Docker is not available