Metagenomics: Who is there, and what can they do?
Metagenomics profiles microbial communities from mixed DNA. Results are often compositional (relative abundances) and sensitive to contamination, batch effects, and reference database choices.
Two common approaches
| Approach | Input | Output |
|---|---|---|
| Marker gene (16S/ITS) | Amplicons | Taxonomic composition (limited resolution) |
| Shotgun metagenomics | Total DNA | Taxa + functional profiles + MAGs |
Key terms
- Alpha diversity: within-sample diversity (e.g., Shannon)
- Beta diversity: between-sample differences (distance metrics)
- Compositional data: proportions sum to 1; naive statistics can mislead
Practical pitfalls
- Reagent contamination (“kitome”) can dominate low-biomass samples.
- Different DNA extraction methods can change observed community structure.
- Database choice changes taxonomic calls; report database version.
- Relative abundance can change even if absolute counts do not (and vice versa).
Recommended controls
- Negative extraction controls
- Mock community / spike-ins (when feasible)
- Randomization across batches
Relative abundance (stacked) — example
Stacked bars are a common overview. For statistics, consider compositional methods (CLR transforms, ANCOM, etc.).
Alpha diversity (Shannon) — example
A minimal analysis outline
- QC reads (remove host contamination if needed)
- Taxonomic profiling (e.g., Kraken2/Bracken, MetaPhlAn)
- Functional profiling (e.g., HUMAnN)
- Ordination + hypothesis tests with careful covariates
- Report sensitivity to database choice and filtering thresholds