Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models
Hao Li, Cor-Paul Bezemer, Ahmed E. Hassan
TL;DR
This study analyzes industry blogs to map FM4SE and SE4FM activities, employing an FM/LLM Jury to label 1,152 posts from leading technology companies. It reveals that code generation dominates FM4SE tasks, while SE4FM concentrates on deployment, system architecture, data management, and customization, with limited emphasis on requirements engineering and design. Eight future research directions are proposed, and the work demonstrates a scalable approach for grey-literature surveys using a panel of foundation models. The findings highlight practical industry needs and provide a bridge between academic insights and real-world practice in FM-enhanced software engineering.
Abstract
Foundation models (FMs) such as large language models (LLMs) have significantly impacted many fields, including software engineering (SE). The interaction between SE and FMs has led to the integration of FMs into SE practices (FM4SE) and the application of SE methodologies to FMs (SE4FM). While several literature surveys exist on academic contributions to these trends, we are the first to provide a practitioner's view. We analyze 155 FM4SE and 997 SE4FM blog posts from leading technology companies, leveraging an FM-powered surveying approach to systematically label and summarize the discussed activities and tasks. We observed that while code generation is the most prominent FM4SE task, FMs are leveraged for many other SE activities such as code understanding, summarization, and API recommendation. The majority of blog posts on SE4FM are about model deployment & operation, and system architecture & orchestration. Although the emphasis is on cloud deployments, there is a growing interest in compressing FMs and deploying them on smaller devices such as edge or mobile devices. We outline eight future research directions inspired by our gained insights, aiming to bridge the gap between academic findings and real-world applications. Our study not only enriches the body of knowledge on practical applications of FM4SE and SE4FM but also demonstrates the utility of FMs as a powerful and efficient approach in conducting literature surveys within technical and grey literature domains. Our dataset, results, code and used prompts can be found in our online replication package at https://github.com/SAILResearch/fmse-blogs.
