The CSE Colloquium Series Hosts Craig Knoblock

Craig Knoblock
Craig Knoblock

The CSE Colloquium Series will host Craig Knoblock, Research Professor at the University of Southern California, on Thursday, March 28 from 4 - 5 p.m. in Avery 115. There will be a reception at 3:30 pm in 348 Avery Hall.

Abstract

There is a great deal of interest in "big data'' today and a significant aspect of the big data problem is how to exploit the large number and wide variety of datasets that are becoming available. One of the key challenges to more effectively use the huge amount of available data is how to acquire or learn the semantics of the data. This problem is often referred to as source modeling and most existing tools leave the source-modeling problem to be solved manually because it is so hard. The result is that source modeling is often the bottleneck in solving many integration problems. In this talk I will first describe our work on developing a general approach to interactive information integration that allows end users to rapidly solve their own integration problems. Then I will present our recent work on interactively constructing semantic descriptions of the data sources to support the larger integration task. The approach to learning source models uses machine learning methods to learn to recognize semantic classes of the data, efficient search algorithms to find the most likely relations between the classes, and a graphical user interface that allow a user to quickly refine the semantic descriptions. I will present an evaluation of the approach on a set of bioinformatics sources and show that it supports the rapid modeling of complex sources with minimal user interaction.

Biography

Craig Knoblock is a Research Professor in Computer Science at the University of Southern California (USC) and the Director of Information Integration at the USC Information Sciences Institute. He received his Bachelor of Science degree from Syracuse University and his Master’s and Ph.D. from Carnegie Mellon University, all in computer science. His research focuses on techniques related to the Semantic Web and Linked Data for describing, acquiring, and exploiting the semantics of data. He has applied this work to constructing distributed, integrated applications from heterogeneous sources through information extraction, source modeling, data cleaning, record linkage, machine learning and other technologies and has applied them to geospatial, biological, and cultural heritage data integration. He has published more than 250 journal articles, book chapters, and conference papers on these topics. Dr. Knoblock is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Distinguished Scientist of the Association of Computing Machinery (ACM), President and Trustee of the International Joint Conference on Artificial Intelligence (IJCAI), and past President of the International Conference on Automated Planning and Scheduling (ICAPS).