Code Revert Prediction with Graph Neural Networks: A Case Study at J.P. Morgan Chase
Yulong Pei, Salwa Alamir, Rares Dolga, Sameena Shah
TL;DR
This work tackles code revert prediction, a specialized defect-detection task in industrial software engineering, by leveraging an undirected code import graph $G=ig\{V,E,X\big\}$ and both code- and developer-related features. It investigates three learning strategies—graph-based imbalanced classification, graph anomaly detection, and balance-aware graph learning—using node2vec and GCN representations, evaluated on a real JPMorgan Chase Python codebase with over $10^7$ lines and less than 4% reverts. Experimental results show that Downsampling + GCN provides the best balance of AUC-ROC and Macro F1, underscoring the importance of addressing class imbalance and leveraging graph structure; pure anomaly detection or standard classifiers underperform in this setting. The findings highlight practical implications for proactive risk management in large-scale, regulated software systems and point to future work on explainability and finer-grained labeling to further improve performance.
Abstract
Code revert prediction, a specialized form of software defect detection, aims to forecast or predict the likelihood of code changes being reverted or rolled back in software development. This task is very important in practice because by identifying code changes that are more prone to being reverted, developers and project managers can proactively take measures to prevent issues, improve code quality, and optimize development processes. However, compared to code defect detection, code revert prediction has been rarely studied in previous research. Additionally, many previous methods for code defect detection relied on independent features but ignored relationships between code scripts. Moreover, new challenges are introduced due to constraints in an industry setting such as company regulation, limited features and large-scale codebase. To overcome these limitations, this paper presents a systematic empirical study for code revert prediction that integrates the code import graph with code features. Different strategies to address anomalies and data imbalance have been implemented including graph neural networks with imbalance classification and anomaly detection. We conduct the experiments on real-world code commit data within J.P. Morgan Chase which is extremely imbalanced in order to make a comprehensive comparison of these different approaches for the code revert prediction problem.
