Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse

Jacopo Tagliabue; Ciro Greco

Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse

Jacopo Tagliabue, Ciro Greco

TL;DR

The work tackles how to enable safe, trustworthy agent-driven automation for data lakehouses by proposing API-first, programmable abstractions that expose the entire data lifecycle. It argues that declarative DAGs, Git-for-Data-like branching, and code-as-interface support reproducibility, observability, and safety, even in the presence of untrusted agents. A proof-of-concept demonstrates self-repair of production pipelines using Bauplan, MCP, and a verifier, showing that untrusted AI agents can operate without compromising production. The study outlines a path toward a fully agentic lakehouse, while identifying future challenges such as scalability and parallelism in OLAP contexts.

Abstract

Data lakehouses run sensitive workloads, where AI-driven automation raises concerns about trust, correctness, and governance. We argue that API-first, programmable lakehouses provide the right abstractions for safe-by-design, agentic workflows. Using Bauplan as a case study, we show how data branching and declarative environments extend naturally to agents, enabling reproducibility and observability while reducing the attack surface. We present a proof-of-concept in which agents repair data pipelines using correctness checks inspired by proof-carrying code. Our prototype demonstrates that untrusted AI agents can operate safely on production data and outlines a path toward a fully agentic lakehouse.

Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse

TL;DR

Abstract

Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)