Table of Contents
Fetching ...

AutoOffAB: Toward Automated Offline A/B Testing for Data-Driven Requirement Engineering

Jie JW Wu

TL;DR

Online A/B testing is reliable but slow and risky for data-driven development. AutoOffAB proposes a fully automated offline A/B testing framework that periodically generates evaluation variants and tests them against updated historical logs. The approach uses a genetic algorithm to maximize a fitness function $c(v, D_{e_k})$, with optional multi-objective optimization, facilitating systematic variant exploration. By continuously refreshing offline evaluations, AutoOffAB aims to narrow the gap between offline and online results and strengthen data-driven requirement engineering in practice.

Abstract

Software companies have widely used online A/B testing to evaluate the impact of a new technology by offering it to groups of users and comparing it against the unmodified product. However, running online A/B testing needs not only efforts in design, implementation, and stakeholders' approval to be served in production but also several weeks to collect the data in iterations. To address these issues, a recently emerging topic, called "Offline A/B Testing", is getting increasing attention, intending to conduct the offline evaluation of new technologies by estimating historical logged data. Although this approach is promising due to lower implementation effort, faster turnaround time, and no potential user harm, for it to be effectively prioritized as requirements in practice, several limitations need to be addressed, including its discrepancy with online A/B test results, and lack of systematic updates on varying data and parameters. In response, in this vision paper, I introduce AutoOffAB, an idea to automatically run variants of offline A/B testing against recent logging and update the offline evaluation results, which are used to make decisions on requirements more reliably and systematically.

AutoOffAB: Toward Automated Offline A/B Testing for Data-Driven Requirement Engineering

TL;DR

Online A/B testing is reliable but slow and risky for data-driven development. AutoOffAB proposes a fully automated offline A/B testing framework that periodically generates evaluation variants and tests them against updated historical logs. The approach uses a genetic algorithm to maximize a fitness function , with optional multi-objective optimization, facilitating systematic variant exploration. By continuously refreshing offline evaluations, AutoOffAB aims to narrow the gap between offline and online results and strengthen data-driven requirement engineering in practice.

Abstract

Software companies have widely used online A/B testing to evaluate the impact of a new technology by offering it to groups of users and comparing it against the unmodified product. However, running online A/B testing needs not only efforts in design, implementation, and stakeholders' approval to be served in production but also several weeks to collect the data in iterations. To address these issues, a recently emerging topic, called "Offline A/B Testing", is getting increasing attention, intending to conduct the offline evaluation of new technologies by estimating historical logged data. Although this approach is promising due to lower implementation effort, faster turnaround time, and no potential user harm, for it to be effectively prioritized as requirements in practice, several limitations need to be addressed, including its discrepancy with online A/B test results, and lack of systematic updates on varying data and parameters. In response, in this vision paper, I introduce AutoOffAB, an idea to automatically run variants of offline A/B testing against recent logging and update the offline evaluation results, which are used to make decisions on requirements more reliably and systematically.
Paper Structure (11 sections, 2 figures)

This paper contains 11 sections, 2 figures.

Figures (2)

  • Figure 1: Visual illustration of the proposed AutoOffAB in the context of Data-Driven Requirements Engineering (DDRE) cycle maalej2015toward. Without this work, the offline A/B testing needs to be conducted manually by software engineers or ML scientists, which depends heavily on their individual skills. With this work, the offline A/B testing is triggered periodically. Thus, engineers or scientists could focus on monitoring and reviewing the results to be used for decisions on requirements.
  • Figure 2: Visual illustration of AutoOffAB. Overall, AutoOffAB uses the program and log streaming to periodically generate a population of variants with modified settings, and then evaluates the variants against the updated chosen logs.