Three M.S. defenses next week

Graduate Defenses
Graduate Defenses

M.S. Thesis Defense: Anup Adhikari
Monday, November 29, 2021
11:30 AM (CST)
Meeting ID: 933 9650 6353

"Agent-based Modeling of the Spread of Social Unrest based on Infectious Disease Spread Model"

Social unrest activities are the tools for people to show dissatisfaction, and often people are motivated by similar unrest activities in another region. This causes a spread of unrest activities across space and time. In this thesis, we model the spread of social unrest across time and space. The underlying novel methodology is to model the regions as agents that transition from one state to another based on changes in their environment. The methodology involves (1) creating a region vector for each agent based on socio-demographic, cultural, economic, infrastructural, geographic, and environmental (SCEIGE) factors, (2) formulating neighborhood distance function to identify the neighbors of the agents based on geospatial distance and SCEIGE proximity, (3) designing transition probability equations based on infectious disease spread models, and (4) building groundtruth for evaluating the simulations. We implement two different social unrest spread models based on two infectious disease models, SIR and SIS. Here we use the concept of contact networks and find the individualized probabilities of each agent to transition from one state to another, which is often used in the infectious disease spread model to establish contact leading to disease in the individual. In our case, we use the contact networks to establish contact leading to social unrest in an agent. The models are tested on India, particularly in the three states, Tamil Nadu, Andhra Pradesh, and Himachal Pradesh, for 2016-2020 on a monthly scale. For the SCEIGE factors, we use labor wages, road density, gross domestic product, number of hospitals, and standard precipitation index sourced from national and international institutes and agencies. For groundtruth, we use the ACLED dataset on political violence and protest. Our findings include (1) the transition probability equations are viable, (2) the agent-based modeling of the spread of social unrest is feasible while treating each region as an agent, which is the novelty of our approach, and (3) the SIS model performs comparatively better than the SIR model.

Committee Members:
Dr. Leen-Kiat Soh, Advisor
Dr. Ashok Samal, Co-Advisor
Dr. Qiuming Yao
Dr. Deepti Joshi

M.S. Project Defense: Lalita Kumawat
Tuesday, November 30, 2021
11:00 AM (CST)
Meeting ID: 978 8884 4890

"DDoS Attack detection using Machine Learning Techniques"

Distributed Denial of Service (DDoS) is a malicious attack to disrupt the network services on a targeted server. The compromised systems (bots) are used to overwhelm the targeted server or system with large number of malicious traffic and makes them slow and unavailable. DDoS attack mainly occurs in Network Layer (3), Transport layer (4) and Application layer (7) of OSI model (Open System Interconnection model). It’s been a challenging problem to detect DDoS attack with high accuracy even after many years of research. The traditional threshold-based method is not very effective because of the increasing complexity of DDoS attack and setting reasonable threshold values. Lots of synthetic Datasets are being created to replicate the DDoS traffic for enabling further research on identifying the DDoS attack. In recent years, there has been a rise in Machine learning based techniques to predict the normal and malicious traffic using large traffic data. In this project, we analyze the CICDDoS2019 dataset which is the result of a project between Canadian Institute of Cybersecurity (CIC) and Communication System Establishment (CSE). This is labeled dataset including different types of Reflective based and Exploitation based DDoS attacks along with the normal traffic. We propose different machine learning techniques to detect the DDoS attack. This project includes three steps. The first step is analyzing the dataset, second step is feature extraction based on Recursive Feature Elimination with Cross Validation method which helps to extract most relevant features and remove weakest features. The third step is detecting DDoS attacks using different classifiers (Decision Tree, Naive Bayes, Random Forest) where the extracted features are used as input features. We investigate these 3 different machine learning classifiers. The Decision Tree classifier achieved the best F1 score 86% with 96% precision and 78% Recall and Random Forest classifier provides F1 Score 63% and Precision 99%. The Naive Bayes classifier performs poorly among these three classifiers in terms of Precision.

Committee Members:
Dr. Byrav Ramamurthy (Advisor)
Dr. Lisong Xu
Dr. Nirnimesh Ghose

M.S. Thesis Defense: Rojina Deuja
Wednesday, December 1, 2021
1:00 PM (CST)
Meeting ID: 998 4120 7914

"Semantically Meaningful Sentence Embeddings"

Text embedding is an approach used in Natural Language Processing (NLP) to represent words, phrases, sentences, and documents. It is the process of obtaining numeric representations of text, that are fed into machine learning models as vectors (arrays of numbers) for desired processing. One of the biggest challenges for text embedding is representing longer segments of text in a manner that the meaning behind the segment and the semantic relationship between its constituents is captured. Such representations are known as semantically meaningful embeddings.

In this study, we seek to improve upon the quality of semantically meaningful embeddings generated for sentences. The current state-of-the-art models are mostly based on transformer networks that utilize attention mechanisms. Such networks use encoders that generate dense vectors to represent input sentences. While most of these models have been simply combining the dense vectors into fixed-sized embeddings, there is no evidence that such heuristic pooling techniques work best for capturing semantic relationships. In this study, we argue that combining the vectors in this way incorporates a lot of unwanted information into the embeddings. In order to capture the true semantic relationship between words in a sentence and get rid of linguistic noise, we propose a modified version of the DeBERTa model with a novel pooling technique. The model uses an FCNN to reduce the size of the encoder output while enriching the expressiveness of semantic information in the embeddings. Our experiments show that the proposed model achieves significant improvement over existing sentence embedding methods on two different datasets - STS Benchmark (STS-B) and SICK-Relatedness (SICK-R). We also create a semantic search engine that encodes an input sentence and returns the top N sentences that are the most similar to it.

Dr. Stephen Scott (Adviser)
Dr. Mohammad Rashedul Hasan (Co-advisor)
Dr. Vinodchandran N. Variyam