Less Noise, More Signal: DRR for Better Optimizations of SE Tasks
Andre Lustosa, Tim Menzies
TL;DR
This work introduces the Dimensionality Reduction Ratio (DRR) as a predictor of when lightweight optimization suffices for SE analytics tasks. By estimating intrinsic dimensionality and challenging the Agrawal threshold, the authors demonstrate that many SE datasets possess high DRR, enabling simple methods like LITE to achieve performance comparable to heavy optimizers such as DEHB, with runtime dramatically reduced. The study spans 24 SE datasets and contrasts against non-SE baselines, showing SE problems can often be solved two orders of magnitude faster without loss of quality. The findings advocate dataset-aware optimization, caution against wholesale use of complex AI optimizers, and propose integrating DRR into practical SE analytics pipelines for adaptive method selection.
Abstract
SE analytics problems do not always need complex AI. Better and faster solutions can sometimes be obtained by matching the complexity of the problem to the complexity of the solution. This paper introduces the Dimensionality Reduction Ratio (DRR), a new metric for predicting when lightweight algorithms suffice. Analyzing SE optimization problems from software configuration to process decisions and open-source project health we show that DRR pinpoints "simple" tasks where costly methods like DEHB (a state-of-the-art evolutionary optimizer) are overkill. For high-DRR problems, simpler methods can be just as effective and run two orders of magnitude faster.
