M.S. defense this Friday

Graduate Defenses
Graduate Defenses

M.S. Defense: Yingshan Li
Friday, Sept. 23
2 p.m.
Zoom

“Sequence-based Bioinformatics Approaches to Predict Virus-host Relationship in Archaea and Eukaryotes”

Until recently, viruses can only be studied individually since conventional viral studies largely depend on lab culturing, which requires viruses to be highly lytic and able to form plaques. However, these highly lethal viruses represent only a small fraction of the virosphere. Viral metagenomics is independent of culturing and capable of investigating viromes of virtually any given environmental niches. While numerous sequences of viral genomes have been assembled from metagenomic studies over the past years, the natural hosts for the majority of these viral contigs have not been determined. Different computational approaches have been developed to predict hosts of bacteria phages. Nevertheless, little progress has been made in the virus-host prediction for viruses that infect eukaryotes and archaea. In this study, by analyzing all documented viruses with known eukaryotic and archaeal hosts, we assessed the predictive power of four different computational approaches in viral host prediction based on the following biological relationships among viruses and hosts: 1. homology between viruses and hosts, where direct genetic interactions between viruses and hosts are assumed to leave traces of historical infections; 2. Co-evolution between viruses and hosts, where viral dependency on their host for replication is assumed to result in similar genomic characteristics such as nucleotide composition and codon bias; 3. Phylogenetic distances between viruses, where phylogenetically close viruses are assumed to infect the same hosts; and 4. genetic similarities between viruses and viruses, where we assume that viruses with similar genetic compositions tend to share the same hosts. We showed that each of the approaches produced better predictions than uninformed guesses, indicating that our current knowledge of virus-host interaction and co-evolution can be exploited to help predict natural hosts among eukaryotes and archaea for viral contigs. Overall, the third and fourth approaches (virus-virus similarity: both k-mer usage, and homology) had the highest prediction accuracy. The second approach (virus-host co-evolution) has the least predictive power. We also discuss the biological underpinnings of different predictive power shown in each of these approaches. We anticipate significant increase in predictive capacity as more training data and knowledge of virus-host relationships is accumulated.

M.S. Committee:
Juan Cui, Advisor
Etsuko Moriyama
Jitender Deogun