Transduction is All You Need for Structured Data Workflows
Alfio Gliozzo, Naweed Khan, Christodoulos Constantinides, Nandana Mihindukulasooriya, Nahuel Defosse, Gaetano Rossiello, Junkyu Lee
TL;DR
Agentics introduces a data-centric paradigm for LLM-based structured workflows built on Logical Transduction Algebra (LTA), where agents are stateless transducers operating over typed data via the transduction operator $\ll$. By embedding schemas (via Pydantic models) and asynchronous MapReduce-style primitives (amap, areduce), the framework enables robust, scalable, and auditable pipelines that outperform traditional prompt-centric approaches on data wrangling, text-to-SQL, and domain-specific reasoning tasks. The paper provides formal definitions, a Python implementation around an AG meta-class and PydanticTransducer, and extensive experiments (schema matching, T2SQL, data imputation, domain MCQA, and DiscoveryBench) illustrating improved accuracy, robustness to perturbations, and favorable runtime characteristics. Collectively, Agentics offers a principled, modular approach to generative structured data workflows with direct implications for enterprise data engineering, reproducibility, and scalable AI-assisted data tasks.
Abstract
This paper introduces Agentics, a functional agentic AI framework for building LLM-based structured data workflow pipelines. Designed for both research and practical applications, Agentics offers a new data-centric paradigm in which agents are embedded within data types, enabling logical transduction between structured states. This design shifts the focus toward principled data modeling, providing a declarative language where data types are directly exposed to large language models and the data values are composed through transductions between input and output types. We present a range of structured data workflow tasks and empirical evidence demonstrating the effectiveness of this approach, including data wrangling, text-to-SQL semantic parsing, and domain-specific multiple-choice question answering, and data-driven scientific discovery tasks.
