Bakhtiar Khan Kasi will present and defend his Ph.D. dissertation on Monday, November 16 at 11:00 a.m. in Avery 347. The title is "Minimizing Software Conflicts through Proactive Detection of Conflicts and Task Scheduling."
Abstract:
Distributed software development has become a norm in today’s large-scale software development. While the use of distributed version control systems helps facilitate peer-to-peer collaboration by enabling developers to work independently in local repositories, software conflicts that arise because of coordination failures are still a regular occurrence. In a study of four popular open-source projects we found conflicts to occur frequently (ranging from 34% to 54% of all merges) taking substantial amounts of time (about 1-14 days (median)) to fix the conflicts.
The state-of-the art has focused on conflict mitigation and aims to notify developers of emerging conflicts. A key type of conflict mitigation technique is embodied in workspace awareness tools. These tools monitor developers’ workspace activities to facilitate coordination among developers, by identifying potential conflicts early, while changes are still small and easier to resolve. However, in this approach conflicts still occur and require developer time and effort to resolve. We propose a novel conflict minimization technique that is designed to avoid conflicts to the extent possible. Our approach proactively identifies potential conflicts among developers’ tasks, encodes them as constraints, and solves the constraint space to recommend a set of conflict- minimal development paths for the team.
This research is the first work towards conflict minimization in software development. In this dissertation, we motivate the study of conflict minimization by conducting an empirical evaluation of four open source projects to characterize the distribution of conflicts and their resolution efforts. We propose a hybrid approach that leverages different data preprocessing heuristics and techniques in natural language processing, machine learning, and information retrieval to predict a priori the set of files that will change for a task–a key input to identifying conflicts among tasks. We introduce and illustrate the generality of our task context identification approach using three popular OSS projects. We implemented our approach in a tool, Cassandra, which extends the Eclipse Mylyn plugin. We evaluated Cassandra, and thereby our approach through several studies, each focusing on a specific aspect of the approach. Our results indicate Cassandra is effective at minimizing conflicts. We evaluated the efficiency of Cassandra by using a simulated set of scenarios with higher than normal incidence of conflicts. Finally, we evaluated the robustness of our approach by evaluating Cassandra’s sensitivity to imprecise information using simulated data with induced errors in the task context.