Table of Contents
Fetching ...

Engineering Systems for Data Analysis Using Interactive Structured Inductive Programming

Shraddha Surana, Ashwin Srinivasan, Michael Bain

TL;DR

iProg is presented, a tool implementing Interactive Structured Inductive Programming, demonstrating that it is possible to identify appropriate system decompositions and construct end-to-end information systems with better performance, higher code quality, and order-of-magnitude faster development compared to Low Code/No Code alternatives.

Abstract

Engineering information systems for scientific data analysis presents significant challenges: complex workflows requiring exploration of large solution spaces, close collaboration with domain specialists, and the need for maintainable, interpretable implementations. Traditional manual development is time-consuming, while "No Code" approaches using large language models (LLMs) often produce unreliable systems. We present iProg, a tool implementing Interactive Structured Inductive Programming. iProg employs a variant of a '2-way Intelligibility' communication protocol to constrain collaborative system construction by a human and an LLM. Specifically, given a natural-language description of the overall data analysis task, iProg uses an LLM to first identify an appropriate decomposition of the problem into a declarative representation, expressed as a Data Flow Diagram (DFD). In a second phase, iProg then uses an LLM to generate code for each DFD process. In both stages, human feedback, mediated through the constructs provided by the communication protocol, is used to verify LLMs' outputs. We evaluate iProg extensively on two published scientific collaborations (astrophysics and biochemistry), demonstrating that it is possible to identify appropriate system decompositions and construct end-to-end information systems with better performance, higher code quality, and order-of-magnitude faster development compared to Low Code/No Code alternatives. The tool is available at: https://shraddhasurana.github.io/dhaani/

Engineering Systems for Data Analysis Using Interactive Structured Inductive Programming

TL;DR

iProg is presented, a tool implementing Interactive Structured Inductive Programming, demonstrating that it is possible to identify appropriate system decompositions and construct end-to-end information systems with better performance, higher code quality, and order-of-magnitude faster development compared to Low Code/No Code alternatives.

Abstract

Engineering information systems for scientific data analysis presents significant challenges: complex workflows requiring exploration of large solution spaces, close collaboration with domain specialists, and the need for maintainable, interpretable implementations. Traditional manual development is time-consuming, while "No Code" approaches using large language models (LLMs) often produce unreliable systems. We present iProg, a tool implementing Interactive Structured Inductive Programming. iProg employs a variant of a '2-way Intelligibility' communication protocol to constrain collaborative system construction by a human and an LLM. Specifically, given a natural-language description of the overall data analysis task, iProg uses an LLM to first identify an appropriate decomposition of the problem into a declarative representation, expressed as a Data Flow Diagram (DFD). In a second phase, iProg then uses an LLM to generate code for each DFD process. In both stages, human feedback, mediated through the constructs provided by the communication protocol, is used to verify LLMs' outputs. We evaluate iProg extensively on two published scientific collaborations (astrophysics and biochemistry), demonstrating that it is possible to identify appropriate system decompositions and construct end-to-end information systems with better performance, higher code quality, and order-of-magnitude faster development compared to Low Code/No Code alternatives. The tool is available at: https://shraddhasurana.github.io/dhaani/

Paper Structure

This paper contains 22 sections, 5 figures, 3 algorithms.

Figures (5)

  • Figure 1: Deciding tags for messages during code identification.
  • Figure 2: DFD structure learning results. Manual: original development DFD (ground-truth). $V,E$: process vertices and edges; $I$: interactions for ratified DFD; Agreement: matching vertex/edge labels (engineer-judged).
  • Figure 3: DFD for PHY system learned by $\mathtt{iProg}$ from problem description. Manually drawn to show the pre and post conditions of each process.
  • Figure 4: DFD for BIO system learned by $\mathtt{iProg}$ from problem description. The specification, pre and post conditions are hidden in the DFD view and are visible within each process page.
  • Figure 5: Comparison of $\mathtt{iProg}$ (semi-automated structure learning + code generation) against LCNC alternatives.

Theorems & Definitions (2)

  • Example 1: Semi-Automated DFD Identification
  • Example 2: Process Element in a DFD