Table of Contents
Fetching ...

ORCA: ORchestrating Causal Agent

Joanie Hayoun Chung, Sumin Lee, Sungbin Lim

Abstract

Causal analysis on relational databases is challenging, as analysis datasets must be repeatedly queried from complex schemas. Recent LLM systems can automate individual steps, but they hardly manage dependencies across analysis stages, making it difficult to preserve consistency between causal hypothesis. We propose ORCA (ORchestrating Causal Agent), an interactive multi-agent framework to enable coherent causal analysis on relational databases by maintaining shared state and introducing human checkpoints. In a controlled user study, participants using ORCA successfully completed end-to-end analysis more often than with a baseline LLM (GPT-4o-mini) assistant by 42 percentage points, achieved substantially lower ATE error, and reduced time spent on repetitive data exploration and query refinement by 76\% on average. These results show that ORCA improves both how users interact with the causal analysis pipeline and the reliability of the resulting causal conclusions.

ORCA: ORchestrating Causal Agent

Abstract

Causal analysis on relational databases is challenging, as analysis datasets must be repeatedly queried from complex schemas. Recent LLM systems can automate individual steps, but they hardly manage dependencies across analysis stages, making it difficult to preserve consistency between causal hypothesis. We propose ORCA (ORchestrating Causal Agent), an interactive multi-agent framework to enable coherent causal analysis on relational databases by maintaining shared state and introducing human checkpoints. In a controlled user study, participants using ORCA successfully completed end-to-end analysis more often than with a baseline LLM (GPT-4o-mini) assistant by 42 percentage points, achieved substantially lower ATE error, and reduced time spent on repetitive data exploration and query refinement by 76\% on average. These results show that ORCA improves both how users interact with the causal analysis pipeline and the reliability of the resulting causal conclusions.

Paper Structure

This paper contains 81 sections, 4 equations, 24 figures, 11 tables.

Figures (24)

  • Figure 1: (a) The step-wise causal analysis procedure, highlighting the analytical questions that arise at each stage. (b) ORCA is aligned with this procedure by an orchestrated multi-agent framework. Three specialized agents share state of resulting artifacts from each module, while a central Orchestrator operates these agents sequentially during the analysis. User interaction is mediated by the Orchestrator through checkpoints where intermediate artifacts are exposed for inspection and feedback.
  • Figure 2: ORCA’s Data Exploration Agent and its interaction structure. The agent prepares an analysis-ready dataset through four modules. Each module produces an intermediate artifact saved in state (middle row) exposing table semantics, relational scope, query structure, and outcomes. At each checkpoint (bottom row), users review and validate these outputs before proceeding.
  • Figure 3: ORCA’s diagnostics-driven Causal Discovery Agent with user checkpoints. Starting from a preprocessed dataset, the agent evaluates causal assumptions, configures compatible discovery strategies, and generates candidate causal graphs. Intermediate artifacts in state (middle row) expose diagnostics, configurations, and candidate graphs. Checkpoints (bottom row) prompt users to validate assumptions, approve configurations, and review graph structures before inference.
  • Figure 4: ORCA’s Causal Inference Agent. Given a causal graph and an analysis-ready dataset, the agent selects inference strategy, executes effect estimation, and interprets results. Intermediate artifacts in state (middle row) expose variable roles, inference strategy, and estimation outputs and at each checkpoint (bottom row), users validate decisions and assess the results.
  • Figure 5: (a) Boxplot of time spent at data exploration analysis step. ORCA significantly reduces time spent on early-stage data exploration. (b) Comparison on structural error of the inferred causal graph. ORCA consistently produces more accurate graphs than the baseline. (c) Distribution of partipants reports over Average Treatment Effect (ATE) absolute error. Outliers are omitted for readability. (d) Total task duration and proportion of total task time spent at each step. ORCA shifts user effort towards human-central causal reasoning from repetitive labor-intensive data exploration.
  • ...and 19 more figures