Ph.D. Defense: Sairam Behera
Wednesday, November 18, 2020
3 p.m. via Zoom
Join Zoom Meeting:
https://go.unl.edu/vbo8
Meeting ID: 951 0525 5518
Passcode: 111820
“Application of Suffix Tree, Minwise Hashing and Streaming Algorithm for Bioinformatics Problems”
In this dissertation, we worked on certain algorithmic problems in bioinformatics using mainly three approaches: (a) streaming model for the large genomics dataset (b) suffix tree-based indexing, and (c) minwise-hashing (minhash) and locality sensitive hashing (LSH). The streaming models are useful for big data problems where a good approximation can be achieved with very limited space usage. In these models, the input data arrive one by one and only the current input needs to be stored and processed. For the first problem, we developed an approximation algorithm using a streaming approach to estimate the frequency counts of k-mers i.e. string or sequence of length k for genomics sequences. For the second problem, we used a suffix tree, a trie data structure, for developing an alignment-free and non-pairwise algorithm for conserved non-coding sequence (CNS) identification. We give two different algorithms to identify exact matched CNSs as well as CNSs with certain mismatches. The algorithms were useful for various researches in the areas of comparative genomics and were used to identify the CNSs in various grass species. We used minhash and LSH based techniques when CNSs are larger in size i.e. ? 100 bp. The minhash approach is used to estimate the Jaccard similarity. Our algorithm used minhash techniques to create signatures for the sequences and LSH-based approach to create a cluster of sequences without using pair-wise operations. For the third problem, we further used the minhash and LSH techniques to address the challenges in isoform clustering. Isoforms are generated from different combinations of exons of the same gene by alternative splicing. As the isoform sequences share same the exon regions, our algorithm clustered these sequences based on their shared minhash signatures. Finally, we discuss an ensemble approach for the de novo transcriptome assembly problem. We first performed a comprehensive performance analysis on different transcriptome assemblers using a simulated dataset. Our new ensemble approach also uses minhash technique to identify potential transcripts from the combined list of contigs that are coming from different de novo transcriptome assemblers.
M.S. Thesis Defense: Junzhe Cai
Thursday, November 19, 2020
2 p.m. via Zoom
Zoom link: https://unl.zoom.us/j/9371166780
“A Novel Spatiotemporal Prediction Method of Cumulative Covid-19 Cases”
Prediction methods are important for many applications. In particular, an accurate prediction for the total number of cases for pandemics such as the Covid-19 pandemic could help medical preparedness by providing in time a sufficient supply of testing kits, hospital beds and medical personnel. This thesis experimentally compares the accuracy of ten prediction methods for the cumulative number of Covid-19 pandemic cases. These ten methods include three types of neural networks and extrapolation methods based on best fit quadratic, best fit cubic and Lagrange interpolation, as well as an extrapolation method from Revesz. We also consider the Kriging and inverse distance weighting spatial interpolation methods. We also develop a novel spatiotemporal prediction method by combining method from Revesz and inverse distance weighting. The experiments show that among these ten prediction methods, the spatiotemporal method has the smallest root mean square error and mean absolute error on Covid-19 cumulative data for counties in New York State between May and July, 2020.
Supervisor: Dr. Peter Z. Revesz
Committee member: Dr. Leen-Kiat Soh, Dr. Ashok Samal
M.S. Defense: Robert (Casey) Lafferty
Monday, November 23, 2020
12:30 p.m. via Zoom
Zoom link: https://unl.zoom.us/my/rclafferty
“Packet Delivery: An Investigation of Educational Video Games for Computer Science Education”
Abstract: The field of educational video games has rapidly grown since the 1970s, mostly producing video games to teach core education concepts such as mathematics, natural science, and English. Recently, various research groups have developed educational games to address elective topics such as finance and health. Educational video games often target grade school audiences and rarely target high school students, college students, or adults. Computer science topics are not a common theme among educational video games; the games that address Computer Science topics teach computer fundamentals, such as typing or basic programming, to young audiences.
Packet Delivery, an educational video game for introductory computer science students, is an investigation into the use of apprenticeship learning, constructivism, and scaffolding learning paradigms to teach the Domain Name System (DNS) lookup process. In Packet Delivery, the player’s primary task is delivering letters without addresses to recipients via a search mechanism that emulates the DNS lookup process. Through practice and in-game upgrades, the player’s goal is to learn the basics of DNS lookup and its optimizations. To analyze comprehension and retention of students playing Packet Delivery, a study containing three tests were given to participants over the course of a few weeks; a pretest gauging prior knowledge, a post-test gauging immediate comprehension, and a follow-up post-test gauging retention. The study provided a proof of concept that educational video games not only have a significant place in higher education, but that apprenticeship learning, constructivism, and scaffolding are highly effective learning paradigms for use within educational video games.