The CSE Colloquium Series Presents David Mimno

Faculty Candidate David Mimno
Faculty Candidate David Mimno

David Mimno, a postdoctoral researcher in the Department of Computer Science at Princeton University and a UNL Faculty Candidate, will be giving a presentation on "Text Mining at Million-Book Scale on November 15 at 4 p.m. in 115 Avery. The presentation will be preceded by a reception at 3:30 p.m. in 348 Avery.

Mimno received his PhD from the University of Massachusetts, Amherst and served as Head Programmer at the Perseus Project before attending graduate school. He is a recipient of the CRA Computing Innovation Fellowship.

Abstract

The large-scale digital collections scanned by Google and the Internet Archive have opened new ways to interact with books. The scale of digitization, however, also presents a challenge. We must find methods that are powerful enough to model the complexity of culture, but simple enough to scale to millions of books. In this talk I’ll discuss one method, statistical topic modeling. I’ll begin with an overview of the method. I will then present recent research on scaling inference to millions of books. Finally, I will demonstrate how to use such a model to measure changes over time and distinctions between sub-corpora, along with hypothesis tests that help us to distinguish consistent patterns from random variations.