Table of Contents
Fetching ...

A Complexity Map of Probabilistic Reasoning for Neurosymbolic Classification Techniques

Arthur Ledaguenel, Céline Hudelot, Mostepha Khouadjia

TL;DR

The paper tackles the scalability challenge of neurosymbolic probabilistic reasoning by developing a unified formalism for four core reasoning problems and constructing the first complexity map that ties tractability to succinct representation languages. It systematically analyzes how different knowledge representations—especially hierarchical, cardinal, simple-path, and matching constraints—affect the feasibility of probabilistic queries, optimization, and enumeration, using knowledge compilation into $d$-DNNF and related targets. The authors provide new tractability results (e.g., for Card and ASPath) and demonstrate that common compilation targets like DNNF/d-DNNF do not cover all tractable cases (as with Match), highlighting limitations and guiding practitioners toward appropriate representation choices. Collectively, the work offers actionable guidance for selecting succinct languages and compilation strategies to scale probabilistic neurosymbolic techniques to large-class, real-world tasks.

Abstract

Neurosymbolic artificial intelligence is a growing field of research aiming to combine neural network learning capabilities with the reasoning abilities of symbolic systems. Informed multi-label classification is a sub-field of neurosymbolic AI which studies how to leverage prior knowledge to improve neural classification systems. Recently, a family of neurosymbolic techniques for informed classification based on probabilistic reasoning has gained significant traction. Unfortunately, depending on the language used to represent prior knowledge, solving certain probabilistic reasoning problems can become prohibitively hard when the number of classes increases. Therefore, the asymptotic complexity of probabilistic reasoning is of cardinal importance to assess the scalability of such techniques. In this paper, we develop a unified formalism for four probabilistic reasoning problems. Then, we compile several known and new tractability results into a single complexity map of probabilistic reasoning. We build on top of this complexity map to characterize the domains of scalability of several techniques. We hope this work will help neurosymbolic AI practitioners navigate the scalability landscape of probabilistic neurosymbolic techniques.

A Complexity Map of Probabilistic Reasoning for Neurosymbolic Classification Techniques

TL;DR

The paper tackles the scalability challenge of neurosymbolic probabilistic reasoning by developing a unified formalism for four core reasoning problems and constructing the first complexity map that ties tractability to succinct representation languages. It systematically analyzes how different knowledge representations—especially hierarchical, cardinal, simple-path, and matching constraints—affect the feasibility of probabilistic queries, optimization, and enumeration, using knowledge compilation into -DNNF and related targets. The authors provide new tractability results (e.g., for Card and ASPath) and demonstrate that common compilation targets like DNNF/d-DNNF do not cover all tractable cases (as with Match), highlighting limitations and guiding practitioners toward appropriate representation choices. Collectively, the work offers actionable guidance for selecting succinct languages and compilation strategies to scale probabilistic neurosymbolic techniques to large-class, real-world tasks.

Abstract

Neurosymbolic artificial intelligence is a growing field of research aiming to combine neural network learning capabilities with the reasoning abilities of symbolic systems. Informed multi-label classification is a sub-field of neurosymbolic AI which studies how to leverage prior knowledge to improve neural classification systems. Recently, a family of neurosymbolic techniques for informed classification based on probabilistic reasoning has gained significant traction. Unfortunately, depending on the language used to represent prior knowledge, solving certain probabilistic reasoning problems can become prohibitively hard when the number of classes increases. Therefore, the asymptotic complexity of probabilistic reasoning is of cardinal importance to assess the scalability of such techniques. In this paper, we develop a unified formalism for four probabilistic reasoning problems. Then, we compile several known and new tractability results into a single complexity map of probabilistic reasoning. We build on top of this complexity map to characterize the domains of scalability of several techniques. We hope this work will help neurosymbolic AI practitioners navigate the scalability landscape of probabilistic neurosymbolic techniques.
Paper Structure (38 sections, 14 theorems, 11 equations, 10 figures, 4 tables)

This paper contains 38 sections, 14 theorems, 11 equations, 10 figures, 4 tables.

Key Result

Proposition 1

H is MPE and ThreshEnum-tractable and PQE and EQE-intractable.

Figures (10)

  • Figure 1: A boolean circuit.
  • Figure 2: Illustration of a boolean circuit $C$ following the graphical representation drawn from Darwiche2011.
  • Figure 3: A schematic illustration of probabilistic neurosymbolic techniques.
  • Figure 4: A complexity map of probabilistic reasoning. An arrow $\mathtt{L_1} \to \mathtt{L_2}$ implies that $\mathtt{L_1}$ can be efficiently compiled to $\mathtt{L_2}$. Color regions indicate on which probabilistic reasoning problems a language is tractable: notice that the tractability region of PQE and EQE is included in the tractability region of MPE and ThreshEnum. Complete languages are represented with thin frames and succinct languages with thick frames. When the tractability of a language $\mathtt{L_1}$ is proved through knowledge compilation $\mathtt{L_1} \to \mathtt{L_2}$, the corresponding proposition is referenced next to the arrow $\to$, otherwise the proposition is referenced next to the frame of the language $\mathtt{L_1}$.
  • Figure 5: An edge-based directed theory $(G:=(V, E), \varsigma)$: each edge $e$ is labeled with its corresponding variable $\varsigma(e) \in \mathbf{Y}$.
  • ...and 5 more figures

Theorems & Definitions (42)

  • Definition 1: Propositional language
  • Remark 1
  • Example 1
  • Example 2
  • Example 3
  • Definition 2
  • Definition 3
  • Remark 2
  • Proposition 1
  • proof
  • ...and 32 more