Day 1: Welcome to the Course!¶
Date: September 1, 2025
Duration: 09:00-13:00 CAT
Focus: Course introduction, genomic surveillance overview, sequencing technologies
Overview¶
Welcome to the Microbial Genomics & Metagenomics Training Course! Day 1 introduces the course, provides essential background on genomic surveillance, covers sequencing technologies, and introduces key databases and tools used throughout the training.
Learning Objectives¶
By the end of Day 1, you will be able to:
- Understand the role of genomic surveillance in public health
- Recognize different sequencing technologies and their applications
- Navigate and use PubMLST database resources
- Perform basic command line operations
- Set up and configure analysis environments
Schedule¶
Time (CAT) | Topic | Links | Trainer |
---|---|---|---|
09:00 | Introductions | All | |
09:10 | Overview of clinical pathogens and genomic surveillance | Slides | Ephifania Geza |
09:40 | Overview of sequencing technologies and data types | Sindiswa Lukhele | |
10:00 | Setting up and exploring PubMLST | Sindiswa Lukhele | |
11:00 | Break | ||
11:30 | Introduction to command line interface | Practical | Arash Iranzadeh |
Key Topics¶
1. Course Introduction and Participant Introductions¶
- Welcome and course overview
- Trainer and participant introductions
- Course objectives and structure
- Training schedule and logistics
2. Clinical Pathogens and Genomic Surveillance¶
- Role of genomics in infectious disease surveillance
- Applications in outbreak investigation
- Antimicrobial resistance monitoring
- Integration with epidemiological data
3. Sequencing Technologies and Data Types¶
- Next-generation sequencing platforms
- Illumina, Oxford Nanopore, PacBio comparison
- Short-read vs long-read technologies
- Data quality considerations and file formats
4. PubMLST Database System¶
- Multi-locus sequence typing (MLST) concepts
- Database navigation and search functions
- Species-specific typing schemes
- Data submission and retrieval
5. Command Line Interface Basics¶
- Introduction to Unix/Linux command line
- Git Bash setup for Windows users
- Basic file operations and navigation
- Introduction to R statistical environment
Tools and Resources¶
Databases Explored¶
- PubMLST - Public databases for molecular typing
- Pathogen databases - Species-specific resources
- MLST schemes - Standardized typing protocols
Software Introduced¶
- Git Bash - Command line interface for Windows
- R/RStudio - Statistical computing environment
- Web browsers - For database navigation
- Terminal applications - Command line access
Hands-on Activities¶
Exercise 1: PubMLST Exploration (30 minutes)¶
Navigate the PubMLST website and explore available databases for different pathogens.
Exercise 2: Basic Command Line Operations (45 minutes)¶
Practice essential Unix commands and file system navigation.
Exercise 3: R Environment Setup (30 minutes)¶
Install and configure R/RStudio for data analysis.
Exercise 4: Database Search Practice (15 minutes)¶
Search for MLST data for specific bacterial isolates.
Key Concepts¶
Genomic Surveillance Applications¶
- Outbreak investigation: Tracking transmission patterns
- Antimicrobial resistance: Monitoring resistance emergence
- Epidemiological studies: Population structure analysis
- Public health response: Informing control measures
Sequencing Technology Comparison¶
Platform | Read Length | Accuracy | Throughput | Cost | Best For |
---|---|---|---|---|---|
Illumina | 150-300 bp | >99% | High | Low | Routine surveillance |
Oxford Nanopore | 1-100 kb | ~95% | Medium | Medium | Structural variants |
PacBio | 10-25 kb | >99% | Medium | High | Complete genomes |
MLST Fundamentals¶
- Housekeeping genes: Conserved sequences for typing
- Allelic profiles: Unique combinations define sequence types
- Population structure: Understanding strain relationships
- Standardization: Reproducible typing across laboratories
Resources¶
Essential Websites¶
- PubMLST - Public databases for molecular typing
- Pathogen Watch - Genomic surveillance platform
- NCBI SRA - Sequence Read Archive
Documentation¶
Training Materials¶
- Command line cheat sheets
- MLST database tutorials
- Sequencing technology overviews
Assessment Activities¶
Individual Tasks¶
- Navigate PubMLST interface successfully
- Execute basic command line operations
- Identify appropriate sequencing platforms for different applications
- Understand MLST typing principles
Group Discussion¶
- Share experiences with different pathogens
- Discuss genomic surveillance challenges in different settings
- Compare sequencing technology applications
- Explore database search strategies
Common Challenges¶
Command Line Anxiety¶
Many participants are new to command line interfaces. We provide: - Patient, step-by-step instruction - Plenty of practice time - Peer support and collaboration - Reference materials for later use
Technical Setup Issues¶
# Common Git Bash issues on Windows
# Check if Git Bash is properly installed
git --version
# Verify R installation
R --version
Database Navigation¶
- Start with simple searches
- Use guided examples
- Practice with known organisms
- Build confidence gradually
Looking Ahead¶
Day 2 Preview: Introduction to Command Line, HPC, & Quality Control including: - High Performance Computing (HPC) introduction - Advanced command line operations - Quality checking and control methods - Species identification techniques - Guest talk on M. tuberculosis and co-infection
Key Learning Outcome: Day 1 establishes the foundational knowledge of genomic surveillance principles, sequencing technologies, and essential database resources that underpin all subsequent training activities.