Skip to content

Day 2: Introduction to Commandline

Date: September 2, 2025
Duration: 09:00-13:00 CAT
Focus: Command line proficiency, M. tuberculosis genomics

Overview

Day 2 focuses on building strong command line skills essential for bioinformatics work. This day provides the computational foundation needed for all subsequent genomic analyses in the course.

Learning Objectives

By the end of Day 2, you will be able to:

  • Master essential Unix/Linux command line operations for bioinformatics workflows
  • Understand M. tuberculosis genomics and co-infection patterns

Schedule

Time (CAT) Topic Links Trainer
09:00 Introduction to command line interface Practical Arash Iranzadeh
11:30 Break
12:00 Guest talk: MtB and co-infection Speaker Bio Bethlehem Adnew

Key Topics

1. Command Line Interface Fundamentals

  • Unix/Linux file system navigation
  • Essential commands for bioinformatics (grep, awk, sed)
  • File manipulation and text processing
  • Pipes and command chaining
  • Working with compressed files (gzip, tar)
  • Shell scripting basics for automation

2. M. tuberculosis and Co-infection

  • TB genomics and strain typing
  • Co-infection patterns and detection
  • Clinical implications
  • Molecular epidemiology approaches
  • Drug resistance mechanisms
  • Public health applications

Tools and Software

Command Line Tools

  • Bash shell - Command line interface and scripting
  • GNU coreutils - Essential Unix utilities (ls, cd, grep, etc.)
  • Text processing - awk, sed, cut, sort, uniq
  • File compression - gzip, tar, zip
  • tmux/screen - Terminal session management
  • rsync/scp - File transfer and synchronization

Hands-on Exercises

Exercise 1: Command Line Fundamentals (90 minutes)

Master essential Unix commands for bioinformatics through practical exercises.

# Navigate file systems and manipulate files
cd ~/data
ls -la
mkdir analysis_output

# Process text files with Unix tools
grep "^>" sequences.fasta | wc -l  # Count sequences
cat sample.fastq | head -20         # View file contents

# Work with compressed files
gzip large_file.txt
gunzip -c compressed.gz | head

# Use pipes and redirection
cat data.txt | sort | uniq > unique_values.txt

Key Concepts

Command Line Essentials

  • File system navigation: Understanding directory structure and paths
  • Text processing: Using grep, sed, awk for data manipulation
  • Pipes and redirection: Chaining commands for complex operations
  • Shell scripting: Automating repetitive tasks
  • Regular expressions: Pattern matching in bioinformatics data

Assessment Activities

Individual Tasks

  • Successfully connect to Ilifu HPC system
  • Navigate Unix file system and manipulate files
  • Complete command line exercises for pathogen genomics data

Group Discussion

  • Share command line tips and tricks
  • Discuss HPC resource management strategies
  • Troubleshoot connection and job submission issues
  • Compare different approaches to batch processing

Common Challenges

Command Line Challenges

# Permission denied errors
chmod +x script.sh    # Make script executable
ls -la                # Check file permissions

# Path issues
echo $PATH            # Check current PATH
export PATH=$PATH:/new/path  # Add to PATH

Command Line Resources

M. tuberculosis Resources

Guest Lecture: MtB and Co-infection

Speaker: Bethlehem Adnew

Key Topics Covered

  • M. tuberculosis genomics: Strain diversity and typing methods
  • Co-infection dynamics: TB-HIV and other respiratory pathogens
  • Diagnostic challenges: Molecular detection in complex samples
  • Treatment implications: Drug resistance in co-infected patients
  • Epidemiological insights: Transmission patterns and control strategies

Interactive Discussion Points

  • Current challenges in TB diagnosis
  • Role of genomics in outbreak investigation
  • Future directions in TB research
  • Integration of genomic and clinical data

Looking Ahead

Day 3 Preview: - Command line proficiency, - HPC fundamentals - Quality checking and control with FastQC - Species identification using Kraken2


Key Learning Outcome: Mastery of command line operations and HPC infrastructure usage provides the essential computational foundation for all subsequent genomic analyses in the course.