Leveraging Language Models to Discover Evidence-Based Actions for OSS Sustainability
Nafiz Imtiaz Khan, Vladimir Filkov
TL;DR
This work tackles the prediction–action gap in OSS sustainability by extracting evidence-based, actionable recommendations (ReACTs) from the software engineering literature using a Retrieval-Augmented Generation (RAG) pipeline. A two-layer prompting strategy—derivation followed by refinement and reliability assessment—enables scalable extraction from 829 ICSE/FSE articles, yielding 1,922 validated ReACTs with strong evidence and high soundness and preciseness. ReACTs are categorized into eight practice-oriented areas and demonstrated through ASFI case studies with APEX, showing how maintainers can identify turning points, map signals to actions, implement, and monitor outcomes. The approach provides a reproducible, scalable bridge between empirical SE findings and practical guidance for OSS projects, enabling targeted interventions to improve long-term sustainability. The resulting catalog of evidence-based actions offers a resource for maintainers and researchers to systematically enhance OSS health and resilience.
Abstract
When successful, Open Source Software (OSS) projects create enormous value, but most never reach a sustainable state. Recent work has produced accurate models that forecast OSS sustainability, yet these models rarely tell maintainers what to do: their features are often high-level socio-technical signals that are not directly actionable. Decades of empirical software engineering research have accumulated a large but underused body of evidence on concrete practices that improve project health. We close this gap by using LLMs as evidence miners over the SE literature. We design a RAG-pipeline and a two-layer prompting strategy that extract researched actionables (ReACTs): concise, evidence-linked recommendations mapping to specific OSS practices. In the first layer, we systematically explore open LLMs and prompting techniques, selecting the best-performing combination to derive candidate ReACTs from 829 ICSE and FSE papers. In the second layer, we apply follow-up prompting to filter hallucinations, extract impact and evidence, and assess soundness and precision. Our pipeline yields 1,922 ReACTs, of which 1,312 pass strict quality criteria and are organized into practice-oriented categories connectable to project signals from tools like APEX. The result is a reproducible, scalable approach turning scattered research findings into structured, evidence-based actions guiding OSS projects toward sustainability.
