Table of Contents
Fetching ...

Querying Everything Everywhere All at Once: Supervaluationism for the Agentic Lakehouse

Jacopo Tagliabue

Abstract

Agentic analytics is turning the lakehouse into a multi-version system: swarms of (human or AI) producers materialize competing pipelines in data branches, while (human or AI) consumers need answers without knowing the underlying data life-cycle. We demonstrate a new system that answers questions across branches rather than at a single snapshot. Our prototype focuses on a novel query path that evaluates queries under supervaluationary semantics. In the absence of comparable multi-branch querying capabilities in mainstream OLAP systems, we open source the demo code as a concrete baseline for the OLAP community.

Querying Everything Everywhere All at Once: Supervaluationism for the Agentic Lakehouse

Abstract

Agentic analytics is turning the lakehouse into a multi-version system: swarms of (human or AI) producers materialize competing pipelines in data branches, while (human or AI) consumers need answers without knowing the underlying data life-cycle. We demonstrate a new system that answers questions across branches rather than at a single snapshot. Our prototype focuses on a novel query path that evaluates queries under supervaluationary semantics. In the absence of comparable multi-branch querying capabilities in mainstream OLAP systems, we open source the demo code as a concrete baseline for the OLAP community.
Paper Structure (19 sections, 9 equations, 3 figures, 2 tables)

This paper contains 19 sections, 9 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Architecture of the demo system. Agentic pipelines write predictions into isolated data branches of the lakehouse. The UI translates English questions into SQL, then executes them using a modified OLAP engine that evaluates answers across branches under supervaluationary (glut-tolerant) semantics.
  • Figure 2: Speedup of the native engine over the ad hoc strategy as a function of branch count. The JOIN query with window functions (Q3) benefits from shared-table reuse; the boolean query (Q4) achieves near-constant time via short-circuit evaluation.
  • Figure 3: Median latency (ms) vs. number of branches. Left: Q1, both engines scale linearly but the native engine's unified plan carries coordination overhead (hash-repartitioning for GROUP BY). Center: Q3, the ad hoc engine scales at ${\sim}160$ ms/branch (recomputing window functions over the shared table), while the native engine stays nearly flat thanks to CTE reuse. Right: Q4, the native engine's short-circuit optimization achieves near-constant ${\sim}11$ ms latency.