Purandare publishes three papers with IIIT-Delhi graduate students | Announce

Dr. Rahul Purandare has had three papers accepted to conferences or journals. He has co-authored the papers with his graduate students from Indraprastha Institute of Information Technology Delhi (IIIT-Delhi).

Purandare is an Associate Professor in the School of Computing at the University of Nebraska–Lincoln. He received his Ph.D. Computer Science from the University of Nebraska–Lincoln in 2011. He is primarily interested in program analysis and combining program analysis with deep learning, NLP, and information retrieval to solve software engineering problems. His research has appeared in the proceedings of several reputed conferences including ICSE, FSE, ASE, ISSTA, OOPSLA, WSDM, ICST, FM, MSR, and RV, and prestigious journals including TSE and TOSEM. He received the ACM distinguished research paper award for his work presented at ISSTA’13 and the best paper award for the work presented at RV’18. He is currently on the review boards of TSE, TOSEM and ASE journals.

Type of Publication: Journal Paper
Authors: Nikita Mehrotra, Akash Sharma, Anmol Jindal, and Rahul Purandare
Title: "Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks"
Publication Location: IEEE Transactions on Software Engineering (TSE)
Additional Notes: Accepted for publication as a Journal First paper.
Abstract: Code clone detection is an important aspect of software development and maintenance. The extensive research in this domain has helped reduce the complexity and increase the robustness of source code, thereby assisting bug detection tools. However, the majority of the clone detection literature is confined to a single language. With the increasing prevalence of cross-platform applications, functionality replication across multiple languages is common, resulting in code fragments having similar functionality but belonging to different languages. Since such clones are syntactically unrelated, single language clone detection tools are not applicable in their case. In this paper, we propose a semi-supervised deep learning-based tool RUBHUS, capable of detecting clones across different programming languages. RUBHUS uses the control and data flow enriched abstract syntax trees (ASTs) of code fragments to leverage their syntactic and structural information and then applies graph neural networks (GNNs) to extract this information for the task of clone detection. We demonstrate the effectiveness of our proposed system through experiments conducted over datasets consisting of Java, C, and Python programs and evaluate its performance in terms of precision, recall, and F1 score. Our results indicate that RUBHUS outperforms the state-of-the-art cross-language clone detection tools.

Type of Publication: Conference Paper
Authors: Khushboo Chitre, Piyus Kedia, and Rahul Purandare
Title: Rapid: Region-based Pointer Disambiguation
Publication Location: ACM SIGPLAN International Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’23)
Additional Notes: Accepted for publication.
Abstract: Interprocedural alias analyses often sacrifice precision for scalability. Thus, modern compilers such as GCC and LLVM implement more scalable but less precise intraprocedural alias analyses. This compromise makes the compilers miss out on potential optimization opportunities, affecting the performance of the application. Modern compilers implement loop-versioning with dynamic checks for pointer disambiguation to enable the missed optimizations. Polyhedral access range analysis and symbolic range analysis enable O(1) range checks for non-overlapping of memory accesses inside loops. However, these approaches work only for the loops in which the loop bounds are loop invariants. To address this limitation, researchers proposed a technique that requires O (log n) memory accesses for pointer disambiguation. Others improved the performance of dynamic checks to single memory access by constraining the object size and alignment. However, the former approach incurs noticeable overhead due to its dynamic checks, whereas the latter has a noticeable allocator overhead. Thus, scalability remains a challenge.

In this work, we present a tool, Rapid, that further reduces the overheads of the allocator and dynamic checks proposed in the existing approaches. The key idea is to identify objects that need disambiguation checks using a profiler and allocate them in different regions, which are disjoint memory areas. The disambiguation checks simply compare the regions corresponding to the objects. The regions are aligned such that the top 32 bits in the addresses of any two objects allocated in different regions are always different. As a consequence, the dynamic checks do not require any memory access to ensure that the objects belong to different regions, making them efficient.

Rapid achieved a maximum performance benefit of around 52.94% for Polybench and 1.88% for CPU SPEC 2017 benchmarks. The maximum CPU overhead of our allocator is 0.57% with a geometric mean of -0.2% for CPU SPEC 2017 benchmarks. Due to the low overhead of the allocator and dynamic checks, Rapid could improve the performance of 12 out of 16 CPU SPEC 2017 benchmarks. In contrast, a state-of-the-art approach used in the comparison could improve only five CPU SPEC 2017 benchmarks.

Type of Publication: Conference Paper
Authors: Piyus Kedia, Rahul Purandare, Udit Agarwal, and Rishabh
Title: CGuard: Scalable and Precise Object Bounds Protection for C
Publication Location: Proceedings of the International Symposium on Software Testing and Analysis (ISSTA)
Additional Notes: Accepted for publication.
Abstract: Spatial safety violations are the root cause of many security attacks and unexpected behavior of applications. Existing techniques to enforce spatial safety work broadly at either object or pointer granularity. Object-based approaches tend to incur high CPU overheads, whereas pointer-based approaches incur both high CPU and memory overheads. SGXBounds, an object-based approach, provides precise out-of-bounds protection for objects at a lower overhead compared to other tools with similar precision. However, a major drawback of this approach is that it cannot support address space larger than 32-bit.

In this paper, we present CGuard, a tool that provides precise object-bounds protection for C applications with comparable over- heads to SGXBounds without restricting the application address space. CGuard stores the bounds information just before the base address of an object and encodes the relative offset of the base address in the spare bits of the virtual address available in x86_64 architecture. For an object that cannot fit in the spare bits, CGuard uses a custom memory layout that enables it to find the base address of the object in just one memory access. Our study revealed spatial safety violations in the gcc and x264 benchmarks from the SPEC CPU2017 benchmark suite and the string_match benchmark from the Phoenix benchmark suite. The execution time overheads for the SPEC CPU2017 and Phoenix benchmark suites were 42% and 26% respectively, whereas the reduction in the throughput for the Apache webserver when the CPUs were fully saturated was 30%. These results indicate that CGuard can be highly effective while maintaining a reasonable degree of efficiency.

Bits & Bytes Wed. Oct. 04, 2023