Predicting Open Source Software Sustainability with Deep Temporal Neural Hierarchical Architectures and Explainable AI

S M Rakib Ul Karim; Wenyi Lu; Enock Kasaadha; Sean Goggins

Predicting Open Source Software Sustainability with Deep Temporal Neural Hierarchical Architectures and Explainable AI

S M Rakib Ul Karim, Wenyi Lu, Enock Kasaadha, Sean Goggins

TL;DR

This work introduces a hierarchical temporal framework to predict OSS sustainability lifecycle stages by jointly modeling 24-month activity sequences and engineered tabular features, routing predictions through a Stage-1 gate, a Heavy Transformer+MLP plus a Light MLP, and a Club-Fed Expert for minority classes. The approach achieves high overall accuracy ($= 94.08\%$) and balanced performance, with attribution analyses showing sustained-contribution and community dynamics as the primary signals and a pronounced recency effect in temporal predictions. Explainability is embedded via SHAP and Integrated Gradients, enabling category-level interpretations and ablation validation, thereby supporting actionable insights for maintainers and funders. The results highlight the central role of continuous contribution and prompt maintenance, offer scalable ecosystem-monitoring capabilities, and outline future directions for broader feature modalities and cross-domain generalization.

Abstract

Open Source Software (OSS) projects follow diverse lifecycle trajectories shaped by evolving patterns of contribution, coordination, and community engagement. Understanding these trajectories is essential for stakeholders seeking to assess project organization and health at scale. However, prior work has largely relied on static or aggregated metrics, such as project age or cumulative activity, providing limited insight into how OSS sustainability unfolds over time. In this paper, we propose a hierarchical predictive framework that models OSS projects as belonging to distinct lifecycle stages grounded in established socio-technical categorizations of OSS development. Rather than treating sustainability solely as project longevity, these lifecycle stages operationalize sustainability as a multidimensional construct integrating contribution activity, community participation, and maintenance dynamics. The framework combines engineered tabular indicators with 24-month temporal activity sequences and employs a multi-stage classification pipeline to distinguish lifecycle stages associated with different coordination and participation regimes. To support transparency, we incorporate explainable AI techniques to examine the relative contribution of feature categories to model predictions. Evaluated on a large corpus of OSS repositories, the proposed approach achieves over 94\% overall accuracy in lifecycle stage classification. Attribution analyses consistently identify contribution activity and community-related features as dominant signals, highlighting the central role of collective participation dynamics.

Predicting Open Source Software Sustainability with Deep Temporal Neural Hierarchical Architectures and Explainable AI

TL;DR

) and balanced performance, with attribution analyses showing sustained-contribution and community dynamics as the primary signals and a pronounced recency effect in temporal predictions. Explainability is embedded via SHAP and Integrated Gradients, enabling category-level interpretations and ablation validation, thereby supporting actionable insights for maintainers and funders. The results highlight the central role of continuous contribution and prompt maintenance, offer scalable ecosystem-monitoring capabilities, and outline future directions for broader feature modalities and cross-domain generalization.

Abstract

Paper Structure (71 sections, 28 equations, 5 figures, 3 tables)

This paper contains 71 sections, 28 equations, 5 figures, 3 tables.

Introduction
Related Work
Open Source Software Sustainability and Prediction
Advanced Computational Methods for Analyzing Temporal OSS Activity
Mining Software Repositories with Machine Learning
Deep Learning for Software Engineering
Temporal Neural Networks and Time Series Prediction
Hierarchical Classification and Class Imbalance Handling
Ensemble Methods and Multi-Stage Learning
Explainable AI for Software Engineering
Synthesis and Motivation
Methodology
Dataset Construction and Preprocessing
Data Collection and Label Definition
Temporal Feature Engineering
...and 56 more sections

Figures (5)

Figure 1: Overview of the hierarchical OSS lifecycle prediction pipeline. Temporal and tabular features are processed through staged classifiers with confidence-based routing to produce final lifecycle stage predictions.
Figure 2: Confusion matrix for Hierarchical Pipeline predictions across four sustainability stages.
Figure 3: Normalized category importance heatmap showing relative influence of feature categories (rows) across different model architectures (columns), with values normalized to [0,1] scale.
Figure 4: Average category contribution bar chart displaying mean total Top Temporal Features by Importance Scores aggregated across all models, sorted by descending influence.
Figure 6: Detailed architectures of the four independently trained models. Panel A illustrates task-specific data preparation and splitting. Panel B shows the internal network architectures, including layer types and dimensions. Panel C summarizes the trained model artifacts and their respective outputs.

Predicting Open Source Software Sustainability with Deep Temporal Neural Hierarchical Architectures and Explainable AI

TL;DR

Abstract

Predicting Open Source Software Sustainability with Deep Temporal Neural Hierarchical Architectures and Explainable AI

Authors

TL;DR

Abstract

Table of Contents

Figures (5)