AutoOffAB: Toward Automated Offline A/B Testing for Data-Driven Requirement Engineering
Jie JW Wu
TL;DR
Online A/B testing is reliable but slow and risky for data-driven development. AutoOffAB proposes a fully automated offline A/B testing framework that periodically generates evaluation variants and tests them against updated historical logs. The approach uses a genetic algorithm to maximize a fitness function $c(v, D_{e_k})$, with optional multi-objective optimization, facilitating systematic variant exploration. By continuously refreshing offline evaluations, AutoOffAB aims to narrow the gap between offline and online results and strengthen data-driven requirement engineering in practice.
Abstract
Software companies have widely used online A/B testing to evaluate the impact of a new technology by offering it to groups of users and comparing it against the unmodified product. However, running online A/B testing needs not only efforts in design, implementation, and stakeholders' approval to be served in production but also several weeks to collect the data in iterations. To address these issues, a recently emerging topic, called "Offline A/B Testing", is getting increasing attention, intending to conduct the offline evaluation of new technologies by estimating historical logged data. Although this approach is promising due to lower implementation effort, faster turnaround time, and no potential user harm, for it to be effectively prioritized as requirements in practice, several limitations need to be addressed, including its discrepancy with online A/B test results, and lack of systematic updates on varying data and parameters. In response, in this vision paper, I introduce AutoOffAB, an idea to automatically run variants of offline A/B testing against recent logging and update the offline evaluation results, which are used to make decisions on requirements more reliably and systematically.
