The Impact of Environment Configurations on the Stability of AI-Enabled Systems
Musfiqur Rahman, SayedHassan Khatoonabadi, Ahmad Abdellatif, Haya Samaana, Emad Shihab
TL;DR
This paper addresses stability challenges in AI-enabled software resulting from environment configurations, focusing on operating system, Python version, and CPU architecture. It adopts an empirical, Travis CI–based methodology across eight configurations and 30 open-source projects, evaluating model performance, processing time, and expense. The study finds pervasive instability across metrics, especially in processing time and cost, with Linux generally offering faster and cheaper runs while MacOS may trade speed for marginal model-performance gains, and ARM64 often underperforming relative to AMD64. The work underscores the importance of dev/prod parity and testing across configurations to identify the most stable deployment setup, offering practical guidance for reducing instability and informing future research into its causes.
Abstract
Nowadays, software systems tend to include Artificial Intelligence (AI) components. Changes in the operational environment have been known to negatively impact the stability of AI-enabled software systems by causing unintended changes in behavior. However, how an environment configuration impacts the behavior of such systems has yet to be explored. Understanding and quantifying the degree of instability caused by different environment settings can help practitioners decide the best environment configuration for the most stable AI systems. To achieve this goal, we performed experiments with eight different combinations of three key environment variables (operating system, Python version, and CPU architecture) on $30$ open-source AI-enabled systems using the Travis CI platform. We determine the existence and the degree of instability introduced by each configuration using three metrics: the output of an AI component of the system (model performance), the time required to build and run the system (processing time), and the cost associated with building and running the system (expense). Our results indicate that changes in environment configurations lead to instability across all three metrics; however, it is observed more frequently with respect to processing time and expense rather than model performance. For example, between Linux and MacOS, instability is observed in 23\%, 96.67\%, and 100\% of the studied projects in model performance, processing time, and expense, respectively. Our findings underscore the importance of identifying the optimal combination of configuration settings to mitigate drops in model performance and reduce the processing time and expense before deploying an AI-enabled system.
