Trustworthy AI in the Agentic Lakehouse: from Concurrency to Governance
Jacopo Tagliabue, Federico Bianchi, Ciro Greco
TL;DR
The paper addresses the challenge of making production AI agents trustworthy on lakehouses by focusing on data and compute isolation rather than relying on traditional MVCC mappings. It introduces Bauplan, an agent-first lakehouse design with copy-on-write branching, FaaS compute, and a unified run API to enable transactional pipelines across multi-language workloads. It argues that this approach provides principled governance through API-level access control and declarative I/O, illustrated with a self-healing pipeline example. The work offers a practical reference implementation and lays the groundwork for scalable, trustworthy agent workflows in data platforms.
Abstract
Even as AI capabilities improve, most enterprises do not consider agents trustworthy enough to work on production data. In this paper, we argue that the path to trustworthy agentic workflows begins with solving the infrastructure problem first: traditional lakehouses are not suited for agent access patterns, but if we design one around transactions, governance follows. In particular, we draw an operational analogy to MVCC in databases and show why a direct transplant fails in a decoupled, multi-language setting. We then propose an agent-first design, Bauplan, that reimplements data and compute isolation in the lakehouse. We conclude by sharing a reference implementation of a self-healing pipeline in Bauplan, which seamlessly couples agent reasoning with all the desired guarantees for correctness and trust.
