Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples
Tim Menzies, Andre Lustosa
TL;DR
This work defines software review as SME-guided evaluation of software behavior under extreme labeling constraints and proposes a predictive framework that guides SMEs and can substitute for the SME panel when needed. Through 31 SE case studies, the lite.py approach demonstrates that high-quality decisions can emerge from as few as 12–30 labels, outperforming prior greedy and clustering baselines like SWAY and SNEAK. Certainty-based labeling strategies within lite.py consistently yield strong results, revealing diminishing returns beyond roughly 30–50 labels and challenging the assumption that more data always improves outcomes. By providing open-source code and data, the study offers a practical path for researchers and practitioners to adopt label-efficient software review and to benchmark future methods in SE analytics.
Abstract
This paper proposes a new challenge problem for software analytics. In the process we shall call "software review", a panel of SMEs (subject matter experts) review examples of software behavior to recommend how to improve that's software's operation. SME time is usually extremely limited so, ideally, this panel can complete this optimization task after looking at just a small number of very informative, examples. To support this review process, we explore methods that train a predictive model to guess if some oracle will like/dislike the next example. Such a predictive model can work with the SMEs to guide them in their exploration of all the examples. Also, after the panelists leave, that model can be used as an oracle in place of the panel (to handle new examples, while the panelists are busy, elsewhere). In 31 case studies (ranging from from high-level decisions about software processes to low-level decisions about how to configure video encoding software), we show that such predictive models can be built using as few as 12 to 30 labels. To the best of our knowledge, this paper's success with only a handful of examples (and no large language model) is unprecedented. In accordance with the principles of open science, we offer all our code and data at https://github.com/timm/ez/tree/Stable-EMSE-paper so that others can repeat/refute/improve these results.
