Alignment (BWA/Bowtie2) and What the Numbers Mean

Alignment places reads onto a reference genome/transcriptome. You should treat alignment as a probabilistic statement—especially in repeats and low-complexity regions.

Key concepts

Primary vs secondary alignments for multi-mapping reads
Soft clipping indicates partial matches (adapters, SVs, errors)
MAPQ reflects placement ambiguity, not base-call quality
Proper pair depends on orientation and insert size expectations

Typical commands

# Index reference (BWA)
bwa index reference.fa

# Align paired-end reads
bwa mem -t 8 reference.fa trimmed_R1.fastq.gz trimmed_R2.fastq.gz \
  | samtools sort -@ 4 -o sample.bam

# Quick stats
samtools flagstat sample.bam
samtools stats sample.bam | head

Sanity checks

Metric	Interpretation
% mapped	Low mapping may indicate contamination, wrong reference, or low quality
% duplicates	High duplicates suggest low library complexity or over-amplification
Insert size	Unexpected distribution can signal library prep issues
Coverage uniformity	Bias suggests GC/capture effects or mapping problems

Healthy

High mapping, expected insert sizes, stable coverage.

Investigate

Low MAPQ reads pile up in repeats; filter carefully.

Mapping summary (example)

Insert size distribution (example)

A note on reference choice

Always align to the correct genome build and annotation version. For RNA-seq, align to genome + spliced aligner (STAR/HISAT2), or use pseudoalignment (Salmon/Kallisto) to a transcriptome. Mixing builds invalidates coordinates and downstream interpretation.