Ph.D. Thesis Defense: Natasha Pavlovikj
Monday, March 28, 2022
1:00 PM (CST)
Zoom: https://unl.zoom.us/j/99895223302
Meeting ID: 998 9522 3302
"Addressing Bioinformatics Bottlenecks for Scalable Microbial Population Genomics Analyses”
The proliferation of genomics data paves the way for population genomics. With population genomics analyses, researchers can understand genetic relationships in populations and their environments, track outbreaks, and develop treatments with high accuracy.
In this dissertation, we address the two main bottlenecks for performing efficient and accurate microbial population analyses: 1) need for scalable and effective computational platform that utilizes powerful computational resources; and 2) strategic algorithm selection of the steps that are part of the population genomics analyses.
To address the need for scalable and efficient computational platform that utilizes powerful computational resources, we developed ProkEvo: an automated, reproducible, and scalable multi-step bioinformatics pipeline for high-throughput bacterial population genomics analyses. ProkEvo utilizes well established bioinformatics tools and performs hierarchical based analyses that can reveal relationships among the species and predict ecological traits based on gene content. ProkEvo has been tested with datasets ranging from ~2,000 to ~20,000 genomes on two different computational platforms with runtime varying from 3-26 days.
To address the strategic algorithm selection of the steps that are part of the population genomics analyses we focused on two applications: 1) Accuracy of tools for read mapping; and 2) Real-time sequence typing of foodborne pathogens.
To investigate the accuracy of tools for mapping and alignment of nanopore reads, we built comprehensive benchmarks and performed consistent comparative performance assessment of five widely used tools for mapping and alignment of nanopore reads. We defined robust statistical metrics for evaluating mapping, alignment, and computational accuracy of these tools. Finally, we provided suggestions of what mapping and alignment tool is better and more accurate with nanopore reads.
To explore the real-time sequence typing of foodborne pathogens, we performed systematic and comprehensive comparison between assembly-dependent and assembly-free methods for scalable bacterial MLST mapping. We demonstrated that the accuracy of these methods is affected by the species and the algorithmic selection of k-mer length depends on the species. Finally, we incorporated both assembly-free and assembly-dependent methods in ProkEvo, providing a practical and viable platform for scalable automated analyses of bacterial populations with direct applications for microbiology research, clinical diagnostics, and epidemiological surveillance.
Committee members:
Dr. Jitender S. Deogun (advisor)
Dr. Andrew K. Benson
Dr. Juan Cui
Dr. Hongfeng Yu
Dr. Etsuko Moriyama