The CSE Colloquium Series will present distinguished speaker Rob Deline, who will talk about "Improving the User Experience of Big Data Analytics", on April 9 at 4 p.m. in Avery 115. The presentation will be preceded by a reception at 3:30 p.m. in Avery 348.
Abstract
Data science today is like software development in the mainframe era: data scientists twiddle their thumbs waiting for big batch jobs to complete and shuffle data around between multiple independent tools, often through tedious clerical work. A typical workflow might include map/reduce systems (Hadoop), database management systems (MySql), spreadsheets, scripting environments (Python), statistical programs (R, Matlab) and machine learning tools (Weka). These bureaucratic workflows have several disadvantages, including the barrier of learning all these tools, the vigilance needed to prevent mistakes, the difficulty of preserving provenance and reproducibility, and the extra effort required to share data sets and analyses. To address these problems, I'll present a demo of our prototype environment for data science, called “Stat!”. The goal of “Stat!” is to allow a data scientist to accomplish an entire workflow, from raw data to final presentations, in one environment. This integration creates the opportunity for high productivity, automated checking, and preservation of data provenance. The project's long-term goal is to democratize data analysis so that, say, the average spreadsheet user can use statistics and machine learning to draw valid conclusions about a data set of her choice.
Biography
DeLine is a principal researcher at Microsoft Research, who studies the work practices of software developers and, more recently, data scientists. From 2005 to 2012, Dr. DeLine founded and managed a research group dedicated the user-centered design of software development tools, with a focus on information seeking, program comprehension and task management. In collaboration with colleagues, he has invented development environments that exploit spatial memory (Debugger Canvas, Code Canvas), a recommendation system for program comprehension (Team Tracks), type systems to enforce API protocols (Fugue, Vault), a software architecture environment (UniCon), and a popular environment for end-user programming (Alice). He received his PhD from Carnegie Mellon University in 1999 and his MS from the University of Virginia in 1993.