Table of Contents
Fetching ...

Transduction is All You Need for Structured Data Workflows

Alfio Gliozzo, Naweed Khan, Christodoulos Constantinides, Nandana Mihindukulasooriya, Nahuel Defosse, Gaetano Rossiello, Junkyu Lee

TL;DR

Agentics introduces a data-centric paradigm for LLM-based structured workflows built on Logical Transduction Algebra (LTA), where agents are stateless transducers operating over typed data via the transduction operator $\ll$. By embedding schemas (via Pydantic models) and asynchronous MapReduce-style primitives (amap, areduce), the framework enables robust, scalable, and auditable pipelines that outperform traditional prompt-centric approaches on data wrangling, text-to-SQL, and domain-specific reasoning tasks. The paper provides formal definitions, a Python implementation around an AG meta-class and PydanticTransducer, and extensive experiments (schema matching, T2SQL, data imputation, domain MCQA, and DiscoveryBench) illustrating improved accuracy, robustness to perturbations, and favorable runtime characteristics. Collectively, Agentics offers a principled, modular approach to generative structured data workflows with direct implications for enterprise data engineering, reproducibility, and scalable AI-assisted data tasks.

Abstract

This paper introduces Agentics, a functional agentic AI framework for building LLM-based structured data workflow pipelines. Designed for both research and practical applications, Agentics offers a new data-centric paradigm in which agents are embedded within data types, enabling logical transduction between structured states. This design shifts the focus toward principled data modeling, providing a declarative language where data types are directly exposed to large language models and the data values are composed through transductions between input and output types. We present a range of structured data workflow tasks and empirical evidence demonstrating the effectiveness of this approach, including data wrangling, text-to-SQL semantic parsing, and domain-specific multiple-choice question answering, and data-driven scientific discovery tasks.

Transduction is All You Need for Structured Data Workflows

TL;DR

Agentics introduces a data-centric paradigm for LLM-based structured workflows built on Logical Transduction Algebra (LTA), where agents are stateless transducers operating over typed data via the transduction operator . By embedding schemas (via Pydantic models) and asynchronous MapReduce-style primitives (amap, areduce), the framework enables robust, scalable, and auditable pipelines that outperform traditional prompt-centric approaches on data wrangling, text-to-SQL, and domain-specific reasoning tasks. The paper provides formal definitions, a Python implementation around an AG meta-class and PydanticTransducer, and extensive experiments (schema matching, T2SQL, data imputation, domain MCQA, and DiscoveryBench) illustrating improved accuracy, robustness to perturbations, and favorable runtime characteristics. Collectively, Agentics offers a principled, modular approach to generative structured data workflows with direct implications for enterprise data engineering, reproducibility, and scalable AI-assisted data tasks.

Abstract

This paper introduces Agentics, a functional agentic AI framework for building LLM-based structured data workflow pipelines. Designed for both research and practical applications, Agentics offers a new data-centric paradigm in which agents are embedded within data types, enabling logical transduction between structured states. This design shifts the focus toward principled data modeling, providing a declarative language where data types are directly exposed to large language models and the data values are composed through transductions between input and output types. We present a range of structured data workflow tasks and empirical evidence demonstrating the effectiveness of this approach, including data wrangling, text-to-SQL semantic parsing, and domain-specific multiple-choice question answering, and data-driven scientific discovery tasks.

Paper Structure

This paper contains 78 sections, 6 theorems, 38 equations, 5 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

Let $AG[X]$ be an Agentic structure and let $\xi$ be the set of all instances of $AG[X]$. Define a binary operation $\circ$ on $\xi$ such that for any $\mathbf{x}_1, \mathbf{x}_2 \in \xi$, their composition $\mathbf{x} = \mathbf{x}_1 \circ \mathbf{x}_2$ is an Agentic instance whose state list is the

Figures (5)

  • Figure 1: Logical Transduction Applied to Sentiment Summary
  • Figure 2: Average time (sec) per question
  • Figure 3: Domain Specific MCQA Perturbation Results.
  • Figure 4: Improvement of test score over iterations: The x-axis represents the number of iterations, and the y-axis shows the test score evaluated using the best prompt template found up to that iteration.
  • Figure 5: Average running time per iteration: The x-axis represents the batch size of the asynchronous execution, and the y-axis shows the average running time in seconds.

Theorems & Definitions (27)

  • Definition 1: Types
  • Definition 2
  • Definition 3: Agentic Structure $AG$
  • Proposition 1: Monoid of Agentic Instances
  • proof
  • Definition 4: Product of Agentic Structures
  • Proposition 2: Monoid of Agentic Product
  • proof
  • Definition 5: Equivalence Relation on Agentic Instances
  • Definition 6: Statewise Equivalence of Agentic Instances
  • ...and 17 more