Bioinformatics Tutorial
Practical • Command-line first

Bioinformatics, from raw reads to results

A modern, English tutorial site covering common NGS workflows: QC, alignment, variant calling, RNA-seq, metagenomics, and phylogenetics. Every lesson focuses on what the files mean, what the tools do, and how to sanity-check outputs.

14+
interactive charts
Mental model

Most pipelines repeat the same pattern:

  1. Validate inputs (format, metadata)
  2. QC and trim (clean the data)
  3. Map or assemble
  4. Call features (variants/genes)
  5. Interpret with statistics
Key file types
TypeUse
FASTQraw reads + quality
BAM/CRAMaligned reads
VCFvariants
GTFgene models
What you’ll learn
  • How Phred scores relate to error rates
  • How mapping quality differs from base quality
  • Why duplicates happen and when to care
  • How to read a VCF like a pro
  • How to avoid common statistical traps
Read length distribution (example)

Trimmed reads often show a peak near the planned read length with a tail of shorter reads.

GC content distribution (example)

Strong deviations from the expected GC profile can indicate contamination or biased libraries.

Where time goes in a typical DNA pipeline

Alignment is often the dominant cost. Profiling helps you choose the best speed/accuracy tradeoff.

Quick start: “hello world” commands

These are safe sanity checks that do not modify files.

# Inspect FASTQ header patterns
zcat sample_R1.fastq.gz | head

# Count reads (FASTQ has 4 lines/read)
expr $(zcat sample_R1.fastq.gz | wc -l) / 4

# Check reference index presence (varies by aligner)
ls -1 reference.*