Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims

Ruiying Chen

Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims

Ruiying Chen

TL;DR

EviBound is an evidence-bound execution framework that eliminates false claims through dual governance gates requiring machine-checkable evidence, and this package includes execution trajectories, MLflow run IDs for all verified tasks, and a 4-step verification protocol.

Abstract

LLM-based autonomous research agents report false claims: tasks marked "complete" despite missing artifacts, contradictory metrics, or failed executions. EviBound is an evidence-bound execution framework that eliminates false claims through dual governance gates requiring machine-checkable evidence. Two complementary gates enforce evidence requirements. The pre-execution Approval Gate validates acceptance criteria schemas before code runs, catching structural violations proactively. The post-execution Verification Gate validates artifacts via MLflow API queries (with recursive path checking) and optionally validates metrics when specified by acceptance criteria. Claims propagate only when backed by a queryable run ID, required artifacts, and FINISHED status. Bounded, confidence-gated retries (typically 1-2 attempts) recover from transient failures without unbounded loops. The framework was evaluated on 8 benchmark tasks spanning infrastructure validation, ML capabilities, and governance stress tests. Baseline A (Prompt-Level Only) yields 100% hallucination (8/8 claimed, 0/8 verified). Baseline B (Verification-Only) reduces hallucination to 25% (2/8 fail verification). EviBound (Dual Gates) achieves 0% hallucination: 7/8 tasks verified and 1 task correctly blocked at the approval gate, all with only approximately 8.3% execution overhead. This package includes execution trajectories, MLflow run IDs for all verified tasks, and a 4-step verification protocol. Research integrity is an architectural property, achieved through governance gates rather than emergent from model scale.

Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims

TL;DR

Abstract

Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)