Bioinformatics Tutorial

Glossary

Short definitions of common bioinformatics terms. (A good habit: always map a term to a file type and a QC check.)

TermMeaning
Phred scoreLog-scaled estimate of base-calling error probability: $Q=-10\log_{10}(p)$.
MAPQMapping quality: confidence that a read is placed at the correct genomic locus.
CIGARCompact encoding of alignment operations (matches, insertions, deletions, clipping).
DuplicateReads likely originating from the same original molecule (PCR/optical); can bias variant calling.
Depth (coverage)Number of reads supporting a genomic position; affects sensitivity and confidence.
Variant (SNP/indel)Difference from reference: single-nucleotide polymorphism or insertion/deletion.
Ti/TvTransition/transversion ratio; plausibility metric for variant sets.
TPMTranscripts per million; length-normalized expression estimate (useful for within-sample comparisons).
CountsRaw read counts per feature; used for differential expression modeling.
FDRFalse discovery rate; expected fraction of false positives among declared discoveries.
Alpha diversityWithin-sample diversity; often summarized by Shannon/Simpson indices.
Beta diversityBetween-sample differences; distances like Bray–Curtis, UniFrac.
Compositional dataData that represent parts of a whole (sum constraint); requires special care in statistics.
Practical “translate it to a check” examples
  • Low MAPQ → inspect repeat regions, try stricter mapping/filtering.
  • High duplicates → check library complexity; consider UMIs or deduplication.
  • Unexpected GC peak → run contamination screen; validate sample sheet.
  • Batch effect → include batch covariates; rerun with balanced design if possible.