Table of Contents
Fetching ...

CodeR3: A GenAI-Powered Workflow Repair and Revival Ecosystem

Asif Zaman, Kallol Naha, Khalid Belhajjame, Hasan M. Jamil

TL;DR

CodeR$^3$ addresses decay and obsolescence of scientific workflows by using LLMs to analyze legacy workflows, generate Python representations, and revive them into modern systems (e.g., Snakemake, VisFlow) with a human-in-the-loop and community validation. It introduces a two-stage revival pipeline and an engine-agnostic approach, balancing automation with expert oversight to preserve scientific intent. Empirical evaluation on ten Taverna-derived workflows shows high automation rates but identifies remaining challenges in service substitution accuracy and validation, motivating crowdsourced verification. The work contributes a practical framework for reproducible, reusable computational workflows across evolving infrastructures.

Abstract

Scientific workflows encode valuable domain expertise and computational methodologies. Yet studies consistently show that a significant proportion of published workflows suffer from decay over time. This problem is particularly acute for legacy workflow systems like Taverna, where discontinued services, obsolete dependencies, and system retirement render previously functional workflows unusable. We present a novel legacy workflow migration system, called CodeR$^3$ (stands for Code Repair, Revival and Reuse), that leverages generative AI to analyze the characteristics of decayed workflows, reproduce them into modern workflow technologies like Snakemake and VisFlow. Our system additionally integrates stepwise workflow analysis visualization, automated service substitution, and human-in-the-loop validation. Through several case studies of Taverna workflow revival, we demonstrate the feasibility of this approach while identifying key challenges that require human oversight. Our findings reveal that automation significantly reduces manual effort in workflow parsing and service identification. However, critical tasks such as service substitution and data validation still require domain expertise. Our result will be a crowdsourcing platform that enables the community to collaboratively revive decayed workflows and validate the functionality and correctness of revived workflows. This work contributes a framework for workflow revival that balances automation efficiency with necessary human judgment.

CodeR3: A GenAI-Powered Workflow Repair and Revival Ecosystem

TL;DR

CodeR addresses decay and obsolescence of scientific workflows by using LLMs to analyze legacy workflows, generate Python representations, and revive them into modern systems (e.g., Snakemake, VisFlow) with a human-in-the-loop and community validation. It introduces a two-stage revival pipeline and an engine-agnostic approach, balancing automation with expert oversight to preserve scientific intent. Empirical evaluation on ten Taverna-derived workflows shows high automation rates but identifies remaining challenges in service substitution accuracy and validation, motivating crowdsourced verification. The work contributes a practical framework for reproducible, reusable computational workflows across evolving infrastructures.

Abstract

Scientific workflows encode valuable domain expertise and computational methodologies. Yet studies consistently show that a significant proportion of published workflows suffer from decay over time. This problem is particularly acute for legacy workflow systems like Taverna, where discontinued services, obsolete dependencies, and system retirement render previously functional workflows unusable. We present a novel legacy workflow migration system, called CodeR (stands for Code Repair, Revival and Reuse), that leverages generative AI to analyze the characteristics of decayed workflows, reproduce them into modern workflow technologies like Snakemake and VisFlow. Our system additionally integrates stepwise workflow analysis visualization, automated service substitution, and human-in-the-loop validation. Through several case studies of Taverna workflow revival, we demonstrate the feasibility of this approach while identifying key challenges that require human oversight. Our findings reveal that automation significantly reduces manual effort in workflow parsing and service identification. However, critical tasks such as service substitution and data validation still require domain expertise. Our result will be a crowdsourcing platform that enables the community to collaboratively revive decayed workflows and validate the functionality and correctness of revived workflows. This work contributes a framework for workflow revival that balances automation efficiency with necessary human judgment.

Paper Structure

This paper contains 13 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Example of a Taverna Workflow for Entrez GeneID to KEGG Pathway Mapping.
  • Figure 2: KEGG pathway hsa05134 retrieved by CodeR$^3$ generated Snakemake workflow for gene ID 7124.
  • Figure 3: Overview of the CodeR$^3$ system and core components
  • Figure 4: The full user interface of the application, displaying Upload, Execution Results, Communication, Network, and Snakemake Workflow sections.