Skip to content

Day 1: Welcome to the Course!

Date: September 1, 2025
Duration: 09:00-13:00 CAT
Focus: Course introduction, genomic surveillance overview, sequencing technologies

Overview

Welcome to the Microbial Genomics & Metagenomics Training Course! Day 1 introduces the course, provides essential background on genomic surveillance, covers sequencing technologies, and introduces key databases and tools used throughout the training.

Learning Objectives

By the end of Day 1, you will be able to:

  • Understand the role of genomic surveillance in public health
  • Recognize different sequencing technologies and their applications
  • Navigate and use PubMLST database resources
  • Perform basic command line operations
  • Set up and configure analysis environments

Schedule

Time (CAT) Topic Links Trainer
09:00 Introductions All
09:10 Overview of clinical pathogens and genomic surveillance Slides Ephifania Geza
09:40 Overview of sequencing technologies and data types Sindiswa Lukhele
10:00 Setting up and exploring PubMLST Sindiswa Lukhele
11:00 Break
11:30 Introduction to command line interface Practical Arash Iranzadeh

Key Topics

1. Course Introduction and Participant Introductions

  • Welcome and course overview
  • Trainer and participant introductions
  • Course objectives and structure
  • Training schedule and logistics

2. Clinical Pathogens and Genomic Surveillance

  • Role of genomics in infectious disease surveillance
  • Applications in outbreak investigation
  • Antimicrobial resistance monitoring
  • Integration with epidemiological data

3. Sequencing Technologies and Data Types

  • Next-generation sequencing platforms
  • Illumina, Oxford Nanopore, PacBio comparison
  • Short-read vs long-read technologies
  • Data quality considerations and file formats

4. PubMLST Database System

  • Multi-locus sequence typing (MLST) concepts
  • Database navigation and search functions
  • Species-specific typing schemes
  • Data submission and retrieval

5. Command Line Interface Basics

  • Introduction to Unix/Linux command line
  • Git Bash setup for Windows users
  • Basic file operations and navigation
  • Introduction to R statistical environment

Tools and Resources

Databases Explored

  • PubMLST - Public databases for molecular typing
  • Pathogen databases - Species-specific resources
  • MLST schemes - Standardized typing protocols

Software Introduced

  • Git Bash - Command line interface for Windows
  • R/RStudio - Statistical computing environment
  • Web browsers - For database navigation
  • Terminal applications - Command line access

Hands-on Activities

Exercise 1: PubMLST Exploration (30 minutes)

Navigate the PubMLST website and explore available databases for different pathogens.

Exercise 2: Basic Command Line Operations (45 minutes)

Practice essential Unix commands and file system navigation.

Exercise 3: R Environment Setup (30 minutes)

Install and configure R/RStudio for data analysis.

Exercise 4: Database Search Practice (15 minutes)

Search for MLST data for specific bacterial isolates.

Key Concepts

Genomic Surveillance Applications

  • Outbreak investigation: Tracking transmission patterns
  • Antimicrobial resistance: Monitoring resistance emergence
  • Epidemiological studies: Population structure analysis
  • Public health response: Informing control measures

Sequencing Technology Comparison

Platform Read Length Accuracy Throughput Cost Best For
Illumina 150-300 bp >99% High Low Routine surveillance
Oxford Nanopore 1-100 kb ~95% Medium Medium Structural variants
PacBio 10-25 kb >99% Medium High Complete genomes

MLST Fundamentals

  • Housekeeping genes: Conserved sequences for typing
  • Allelic profiles: Unique combinations define sequence types
  • Population structure: Understanding strain relationships
  • Standardization: Reproducible typing across laboratories

Resources

Essential Websites

Documentation

Training Materials

  • Command line cheat sheets
  • MLST database tutorials
  • Sequencing technology overviews

Assessment Activities

Individual Tasks

  • Navigate PubMLST interface successfully
  • Execute basic command line operations
  • Identify appropriate sequencing platforms for different applications
  • Understand MLST typing principles

Group Discussion

  • Share experiences with different pathogens
  • Discuss genomic surveillance challenges in different settings
  • Compare sequencing technology applications
  • Explore database search strategies

Common Challenges

Command Line Anxiety

Many participants are new to command line interfaces. We provide: - Patient, step-by-step instruction - Plenty of practice time - Peer support and collaboration - Reference materials for later use

Technical Setup Issues

# Common Git Bash issues on Windows
# Check if Git Bash is properly installed
git --version

# Verify R installation
R --version

Database Navigation

  • Start with simple searches
  • Use guided examples
  • Practice with known organisms
  • Build confidence gradually

Looking Ahead

Day 2 Preview: Introduction to Command Line, HPC, & Quality Control including: - High Performance Computing (HPC) introduction - Advanced command line operations - Quality checking and control methods - Species identification techniques - Guest talk on M. tuberculosis and co-infection


Key Learning Outcome: Day 1 establishes the foundational knowledge of genomic surveillance principles, sequencing technologies, and essential database resources that underpin all subsequent training activities.