
Master's Thesis Defense: Charles Moloney
Wednesday, April 8
11 AM
AVH 103C
Zoom: https://unl.zoom.us/j/99153629726
"Leveraging Code Embeddings to Identify and Address Blind Spots in Benchmark Creation"
Formal software verification remains critical for early vulnerability detection, yet benchmarking these tools is costly and often reliant on centralized datasets such as SV-COMP. While such repositories enable standardized evaluation, they introduce risks of overfitting and bias, particularly due to first-party benchmark contributions. To address these limitations, we extend ARG-V, our tool for generating SV-COMP-compatible benchmarks from real-world Java code, with a novel approach of using code embedding techniques to selectively sample from mined code. By leveraging Nomic Embed Code and a cosine-based Minimum Hyperspherical Energy (MHE) objective, we systematically select and transform benchmarks from scraped GitHub code that maximize embedding diversity and minimize redundancy with existing corpora. From 5,100 candidate files, we identify 86 new benchmarks and evaluate them against the state-of-the-art in Java verification. We demonstrate that proposed benchmarks significantly increase verification difficulty for all four of the leading Java verifiers at SV-COMP 2026, observing a notable decline in verifier accuracy for all verifiers across both assertion safety and exception detection tasks. Qualitative analysis further demonstrates the new benchmarks selected with Minimum Hyperspherical Energy introduce and address numerous "blind spots" for Java features in the SV-COMP corpus, including lambdas, streams, and GUI libraries. These findings highlight the potential of embedding-driven benchmark selection to improve the diversity, realism, and evaluation rigor within software verification.
Committee:
Dr. Robert Dyer (chair)
Dr. Hamid Bagheri
Dr. Seth Polsley