Mining for Cost Awareness in the Infrastructure as Code Artifacts of Cloud-based Applications: an Exploratory Study
Daniel Feitosa, Matei-Tudor Penca, Massimiliano Berardi, Rares-Dorian Boza, Vasilios Andrikopoulos
TL;DR
This study investigates whether cost awareness permeates the development of cloud-based software by mining Infrastructure as Code artifacts, focusing on Terraform. Using Mining Software Repositories methods, the authors analyze 2,010 Terraform-bearing GitHub repositories and 538 commits plus 208 issues to identify cost-related content, supplemented by topic modeling and a knowledge graph to organize findings. They show that developers not only discuss deployment costs but also take concrete actions to reduce them, and they triangulate these insights with Stack Overflow discussions to validate practical relevance. The work provides a publicly available dataset and scripts, offering a foundation for future research and practical guidance on cost-aware deployment decisions across service selection, resource allocation, and deployment optimization.
Abstract
Context: The popularity of cloud computing as the primary platform for developing, deploying, and delivering software is largely driven by the promise of cost savings. Therefore, it is surprising that no empirical evidence has been collected to determine whether cost awareness permeates the development process and how it manifests in practice. Objective: This study aims to provide empirical evidence of cost awareness by mining open source repositories of cloud-based applications. The focus is on Infrastructure as Code artifacts that automate software (re)deployment on the cloud. Methods: A systematic search through 152,735 repositories resulted in the selection of 2,010 relevant ones. We then analyzed 538 relevant commits and 208 relevant issues using a combination of inductive and deductive coding. Results: The findings indicate that developers are not only concerned with the cost of their application deployments but also take actions to reduce these costs beyond selecting cheaper cloud services. We also identify research areas for future consideration. Conclusion: Although we focus on a particular Infrastructure as Code technology (Terraform), the findings can be applicable to cloud-based application development in general. The provided empirical grounding can serve developers seeking to reduce costs through service selection, resource allocation, deployment optimization, and other techniques.
