Master's Thesis Defense: "Towards Building a Review Recommendation System That Trains Novices by Leveraging the Actions of Experts"
Shilpa Khanal
Committee Members: Dr. Leen-Kiat Soh (Advisor)
Dr. Peter Revesz and Dr. Ashok Samal
Friday, December 9, 2016, 9:00 a.m.
256C Avery Hall
Abstract:
Online reviews increase consumer visits, increase the time spent on the website, and create a sense of community among the frequent shoppers. Because of the importance of online reviews, online retailers such as Amazon.com and eOpinions provide detailed guidelines for writing reviews. However, though these guidelines provide instructions on how to write reviews, reviewers are not provided instructions for writing product-specific reviews. As a result, poorly-written reviews are abound and a customer may need to scroll through a large number of reviews, which could be up to 6000 pixels down from the top of the page, in order to find helpful information about a product (Porter, 2010). Thus, there is a need to train reviewers to write better reviews, which could in turn better serve customers, vendors, and online e-stores. In this Thesis, we propose a review recommendation framework to train reviewers to better write about their experiences with a product by leveraging the behaviors of expert reviewers who are good at writing helpful reviews.
First, we use clustering to model reviewers into different classes that reflect different skill levels to write a quality review such as expert, novice, etc. Through temporal analysis of reviewer behavior, we have found that reviewers evolve over time, with their reviews becoming better or worse in quality and more or less in quantity. We also investigate how reviews are valued differently across different product categories. Through machine learning-based classification techniques, we have found that, for products associated with prevention consumption goal, longer reviews are perceived to be more helpful; and, for products associated with promotion consumption goal, positive reviews are more helpful than negative ones.
In this Thesis, our proposed review recommendation framework is aimed to help a novice or conscientious reviewer become an expert reviewer. Our assumption is that a reviewer will reach the highest level of expertise by learning from the experiences of his or her closest experts who have a similar evolutionary pattern to that of the reviewer who is being trained. In order to provide assistance with intermediate steps for the reviewer to grow from his or her current state to the highest level of expertise, we want to recommend the positive actions—that are not too far out of reach of the reviewer—and discourage the negative actions—that are within reach of the reviewer—of the reviewer’s closest experts. Recommendations are personalized to fit the expertise level of reviewers, their evolution trend, and product category. Using the proposed review recommendation system framework we have found that for a random reviewer, at least 80% of the reviews posted by closest experts were of higher quality than that of the novice reviewer. This is verified in a dataset of 2.3 million reviewers, whose reviews cover products from nine different product categories such as Books, Electronics, Cellphones and accessories, Grocery and gourmet food, Office product, Health and personal care, Baby, Beauty, and Pet supplies.
Dissertation Defense: "Finding DNA Motifs: A Probabilistic Suffix Tree Approach"
Abhishek Majumdar
Dr. Stephen Scott and Dr. Jitender Deogun (Co-Advisor)
Dr. Lisong Xu; Dr. Steven Harris; and Dr. Etsuko Moriyama
Thursday, December 15, 2016, 11:30 a.m.
112 Schorr Center
Abstract:
We address the problem of de novo motif identification. That is, given a set of DNA sequences we try to identify motifs in the dataset without having any prior knowledge about existence of any motifs in the dataset. We propose a method based on Probabilistic Suffix Trees (PSTs) to identify fixed-length motifs from a given set of DNA sequences. Our experiments reveal that our approach successfully discovers true motifs. Our experiments on synthetic data show that the motifs found by our method are capable of almost perfectly (Area Under ROC curve ≈ 0.987) distinguishing their sequence clusters from other clusters. We compared our method with the popular MEME algorithm, and observed that it detects a larger number of correct and statistically significant motifs than MEME. Our method is highly efficient as compared to MEME in finding the motifs when processing datasets of 1000 or more sequences. We applied our method to sequences of mutant strains of Exophiala dermatitidis and successfully identified motifs that revealed several transcription factor binding sites. This information is important to biologists for performing experiments to understand their role in different regulatory pathways affected by cdc42. We also show that our PST approach to de novo motif discovery can be used successfully to identify motifs in ChIP-Seq datasets. These motifs in turn identify binding sites for proteins in the sequences.