“Collective Factorization for Multi-relational Yelp Dataset”
Yang Liu
Advisor: Lisong Xu
Committee members: Peter Revesz, Hongfeng Yu
Date: April 18, 2018
Time: 2:30 p.m.
Room: 211 Schorr
Abstract: Unknown value prediction in relational database systems is an active research topic in data mining area. In many real-world scenarios, datasets are partially observed and thus some important values are missing. For example, the rating score of a restaurant plays an important role in evaluating its overall service in the Yelp dataset. It is particularly useful for the owners of the newly-opened stores, if they can predict their rating scores in advance, or if they can discover the hidden factors that affect the rating scores so that they can improve their overall services. Among different approaches, collective matrix factorization has proved to be an effective way for unknown values prediction, especially for relational datasets. In our work, we study a representative real-world and large dataset, the Yelp open source dataset, which contains 6 relational entities and different types of data. We extend the traditional collective matrix factorization model that only supports binary relational data to incorporate multi-relational ones contained in the Yelp dataset. Moreover, the modified Latent Dirichlet Allocation (LDA) approach is used to extract representational topics for text preprocessing. Furthermore, the filtered topics are included in the collective factorization model to support text information in the analysis. The evaluation shows that our method greatly improves the accuracy in predicting unknown values in the Yelp dataset.