A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic

Gregory Coppola

A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic

Gregory Coppola

TL;DR

The paper addresses the need for theorem-proving in information retrieval as it synthesizes across multiple sources. It introduces a natural-deduction–based framework with three decidable fragments—Forward, Query, and Planning—and analyzes their computational properties under a Datalog-style safety constraint. Key findings include linear-time Horn satisfiability in the Forward Fragment, exponential worst-case blow-up for full existential queries (e.g., $O(G D^N)$), and practical best-effort fragments (Shallow Queries, Probabilistic Ranking, A* Search) plus a Planning Fragment for uncertainty. The work offers a principled taxonomy linking logical deduction with probabilistic reasoning in a graphical inference setting and suggests scalable strategies for real-world IR synthesis.

Abstract

Given the emergent reasoning abilities of large language models, information retrieval is becoming more complex. Rather than just retrieve a document, modern information retrieval systems advertise that they can synthesize an answer based on potentially many different documents, conflicting data sources, and using reasoning. But, different kinds of questions have different answers, and different answers have different complexities. In this paper, we introduce a novel framework for analyzing the complexity of a question answer based on the natural deduction calculus as presented in Prawitz (1965). Our framework is novel both in that no one to our knowledge has used this logic as a basis for complexity classes, and also in that no other existing complexity classes to these have been delineated using any analogous methods either. We identify three decidable fragments in particular called the forward, query and planning fragments, and we compare this to what would be needed to do proofs for the complete first-order calculus, for which theorem-proving is long known to be undecidable.

A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic

TL;DR

), and practical best-effort fragments (Shallow Queries, Probabilistic Ranking, A* Search) plus a Planning Fragment for uncertainty. The work offers a principled taxonomy linking logical deduction with probabilistic reasoning in a graphical inference setting and suggests scalable strategies for real-world IR synthesis.

Abstract

Paper Structure (49 sections, 29 equations)

This paper contains 49 sections, 29 equations.

Introduction
Information Retrieval Now Requires Theorem-Proving
A Framework Based on Natural Deduction
Analyzing the Reasoning Abilities of Transformers
Background
First-Order Theorems
The Church-Turing Thesis
The Natural Deduction Calculus
The Forward Fragment
Fragment Definition
Horn Clauses
Disjunctive Normal Form
Conjoined Conclusions
Datalog's Safety Restriction
Analysis of the Fragment
...and 34 more sections

A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic

TL;DR

Abstract

A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic

Authors

TL;DR

Abstract

Table of Contents