Table of Contents
Fetching ...

PalimpChat: Declarative and Interactive AI analytics

Chunwei Liu, Gerardo Vitagliano, Brandon Rose, Matt Prinz, David Andrew Samson, Michael Cafarella

TL;DR

PalimpChat tackles the barrier of implementing declarative AI pipelines for unstructured data by providing a natural-language interface that leverages Palimpzest's optimizer and Archytas reasoning. The approach combines a declarative pipeline model with a ReAct-style agent that can decompose user requests into tool invocations and generate executable pipelines. Key contributions include the integration of Palimpzest with Archytas inside a Beaker notebook-chat environment, the demonstration of end-to-end pipelines (e.g., biomedical literature mining) with cost-quality runtime trade-offs, and the ability to inspect execution metrics and export code. The work demonstrates practical accessibility and scalability of declarative AI frameworks for real-world data workflows.

Abstract

Thanks to the advances in generative architectures and large language models, data scientists can now code pipelines of machine-learning operations to process large collections of unstructured data. Recent progress has seen the rise of declarative AI frameworks (e.g., Palimpzest, Lotus, and DocETL) to build optimized and increasingly complex pipelines, but these systems often remain accessible only to expert programmers. In this demonstration, we present PalimpChat, a chat-based interface to Palimpzest that bridges this gap by letting users create and run sophisticated AI pipelines through natural language alone. By integrating Archytas, a ReAct-based reasoning agent, and Palimpzest's suite of relational and LLM-based operators, PalimpChat provides a practical illustration of how a chat interface can make declarative AI frameworks truly accessible to non-experts. Our demo system is publicly available online. At SIGMOD'25, participants can explore three real-world scenarios--scientific discovery, legal discovery, and real estate search--or apply PalimpChat to their own datasets. In this paper, we focus on how PalimpChat, supported by the Palimpzest optimizer, simplifies complex AI workflows such as extracting and analyzing biomedical data.

PalimpChat: Declarative and Interactive AI analytics

TL;DR

PalimpChat tackles the barrier of implementing declarative AI pipelines for unstructured data by providing a natural-language interface that leverages Palimpzest's optimizer and Archytas reasoning. The approach combines a declarative pipeline model with a ReAct-style agent that can decompose user requests into tool invocations and generate executable pipelines. Key contributions include the integration of Palimpzest with Archytas inside a Beaker notebook-chat environment, the demonstration of end-to-end pipelines (e.g., biomedical literature mining) with cost-quality runtime trade-offs, and the ability to inspect execution metrics and export code. The work demonstrates practical accessibility and scalability of declarative AI frameworks for real-world data workflows.

Abstract

Thanks to the advances in generative architectures and large language models, data scientists can now code pipelines of machine-learning operations to process large collections of unstructured data. Recent progress has seen the rise of declarative AI frameworks (e.g., Palimpzest, Lotus, and DocETL) to build optimized and increasingly complex pipelines, but these systems often remain accessible only to expert programmers. In this demonstration, we present PalimpChat, a chat-based interface to Palimpzest that bridges this gap by letting users create and run sophisticated AI pipelines through natural language alone. By integrating Archytas, a ReAct-based reasoning agent, and Palimpzest's suite of relational and LLM-based operators, PalimpChat provides a practical illustration of how a chat interface can make declarative AI frameworks truly accessible to non-experts. Our demo system is publicly available online. At SIGMOD'25, participants can explore three real-world scenarios--scientific discovery, legal discovery, and real estate search--or apply PalimpChat to their own datasets. In this paper, we focus on how PalimpChat, supported by the Palimpzest optimizer, simplifies complex AI workflows such as extracting and analyzing biomedical data.

Paper Structure

This paper contains 7 sections, 6 figures.

Figures (6)

  • Figure 1: An overview of data processing using PalimpChat and Palimpzest
  • Figure 2: An example Archytas tool used to generate an extraction schema with Palimpzest. Docstrings are important to provide context for the reasoning agent. Inputs variables are injected with the template syntax {{variable}}.
  • Figure 3: Setting an input dataset through PalimpChat
  • Figure 4: Building a pipeline through natural language. As seen in the last call, the agent reasons and may decide to decompose a user question into several tasks required before execution.
  • Figure 5: The output of the execution of the scientific discovery use case. Users can visualize both output records, as well as summary information about the plan execution such as the operators chosen and the total pipeline cost and runtime.
  • ...and 1 more figures