Practical • Command-line first

Bioinformatics, from raw reads to results

A modern, English tutorial site covering common NGS workflows: QC, alignment, variant calling, RNA-seq, metagenomics, and phylogenetics. Every lesson focuses on what the files mean, what the tools do, and how to sanity-check outputs.

Start Learning →

14+

interactive charts

Mental model

Most pipelines repeat the same pattern:

Validate inputs (format, metadata)
QC and trim (clean the data)
Map or assemble
Call features (variants/genes)
Interpret with statistics

Key file types

Type	Use
`FASTQ`	raw reads + quality
`BAM/CRAM`	aligned reads
`VCF`	variants
`GTF`	gene models

What you’ll learn

How Phred scores relate to error rates
How mapping quality differs from base quality
Why duplicates happen and when to care
How to read a VCF like a pro
How to avoid common statistical traps

Read length distribution (example)

Trimmed reads often show a peak near the planned read length with a tail of shorter reads.

GC content distribution (example)

Strong deviations from the expected GC profile can indicate contamination or biased libraries.

Where time goes in a typical DNA pipeline

Alignment is often the dominant cost. Profiling helps you choose the best speed/accuracy tradeoff.

Quick start: “hello world” commands

These are safe sanity checks that do not modify files.

# Inspect FASTQ header patterns
zcat sample_R1.fastq.gz | head

# Count reads (FASTQ has 4 lines/read)
expr $(zcat sample_R1.fastq.gz | wc -l) / 4

# Check reference index presence (varies by aligner)
ls -1 reference.*