Table of Contents
Fetching ...

ABD: Default Exception Abduction in Finite First Order Worlds

Serafim Batzoglou

TL;DR

ABD is introduced, a benchmark for default-exception abduction over finite first-order worlds, and three observation regimes are formalized with exact SMT verification with exact SMT verification.

Abstract

We introduce ABD, a benchmark for default-exception abduction over finite first-order worlds. Given a background theory with an abnormality predicate and a set of relational structures, a model must output a first-order formula that defines exceptions, restoring satisfiability while keeping exceptions sparse. We formalize three observation regimes (closed-world, existential completion, universal completion) with exact SMT verification. Evaluating ten frontier LLMs on 600 instances, the best models achieve high validity but parsimony gaps remain, and holdout evaluation reveals distinct generalization failure modes across regimes.

ABD: Default Exception Abduction in Finite First Order Worlds

TL;DR

ABD is introduced, a benchmark for default-exception abduction over finite first-order worlds, and three observation regimes are formalized with exact SMT verification with exact SMT verification.

Abstract

We introduce ABD, a benchmark for default-exception abduction over finite first-order worlds. Given a background theory with an abnormality predicate and a set of relational structures, a model must output a first-order formula that defines exceptions, restoring satisfiability while keeping exceptions sparse. We formalize three observation regimes (closed-world, existential completion, universal completion) with exact SMT verification. Evaluating ten frontier LLMs on 600 instances, the best models achieve high validity but parsimony gaps remain, and holdout evaluation reveals distinct generalization failure modes across regimes.
Paper Structure (141 sections, 24 equations, 1 figure, 25 tables)