Table of Contents
Fetching ...

Independence Is Not an Issue in Neurosymbolic AI

Håkan Karlsson Faronius, Pedro Zuidberg Dos Martires

TL;DR

The paper shows that semantic loss in neurosymbolic AI is a special case of disjunctive supervision and that deterministic bias is not an inherent consequence of conditional independence when the full semantic loss is correctly applied. Through traffic-light MNIST experiments, it demonstrates that a DSIC parameterization avoids the Winner-Take-All effect seen with softmax in disjunctive supervision, while truncated semantic loss can induce deterministic bias. The results clarify the relationship between neurosymbolic learning and disjunctive supervision, cautioning against nonstandard loss formulations and highlighting the practical benefits of DSIC in weak supervision settings. Overall, the work argues that conditional independence remains a useful, not harmful, inductive bias for NeSy systems when losses are used properly, with implications for designing interpretable, data-efficient AI systems.

Abstract

A popular approach to neurosymbolic AI is to take the output of the last layer of a neural network, e.g. a softmax activation, and pass it through a sparse computation graph encoding certain logical constraints one wishes to enforce. This induces a probability distribution over a set of random variables, which happen to be conditionally independent of each other in many commonly used neurosymbolic AI models. Such conditionally independent random variables have been deemed harmful as their presence has been observed to co-occur with a phenomenon dubbed deterministic bias, where systems learn to deterministically prefer one of the valid solutions from the solution space over the others. We provide evidence contesting this conclusion and show that the phenomenon of deterministic bias is an artifact of improperly applying neurosymbolic AI.

Independence Is Not an Issue in Neurosymbolic AI

TL;DR

The paper shows that semantic loss in neurosymbolic AI is a special case of disjunctive supervision and that deterministic bias is not an inherent consequence of conditional independence when the full semantic loss is correctly applied. Through traffic-light MNIST experiments, it demonstrates that a DSIC parameterization avoids the Winner-Take-All effect seen with softmax in disjunctive supervision, while truncated semantic loss can induce deterministic bias. The results clarify the relationship between neurosymbolic learning and disjunctive supervision, cautioning against nonstandard loss formulations and highlighting the practical benefits of DSIC in weak supervision settings. Overall, the work argues that conditional independence remains a useful, not harmful, inductive bias for NeSy systems when losses are used properly, with implications for designing interpretable, data-efficient AI systems.

Abstract

A popular approach to neurosymbolic AI is to take the output of the last layer of a neural network, e.g. a softmax activation, and pass it through a sparse computation graph encoding certain logical constraints one wishes to enforce. This induces a probability distribution over a set of random variables, which happen to be conditionally independent of each other in many commonly used neurosymbolic AI models. Such conditionally independent random variables have been deemed harmful as their presence has been observed to co-occur with a phenomenon dubbed deterministic bias, where systems learn to deterministically prefer one of the valid solutions from the solution space over the others. We provide evidence contesting this conclusion and show that the phenomenon of deterministic bias is an artifact of improperly applying neurosymbolic AI.

Paper Structure

This paper contains 18 sections, 2 theorems, 22 equations, 3 figures.

Key Result

theorem thmcountertheorem

Neurosymbolic classification (cf. Equation eq:ce_nesy) is a special case of disjunctive supervision (cf. Equation eq:disj_super).

Figures (3)

  • Figure 1: Experimental evaluation with disjunctive supervision. The plots show the mean value of the probability at each iteration over 20 runs. The shaded areas indicate the $95\%$ confidence intervals. We report the mean values separately for the four different parts of the test set. The labels under each graph indicate which part is being evaluated, e.g. $\neg\text{red}\land\text{green}$ corresponds to having as input an MNIST digit depicting a zero and an MNIST image depicting a one. It can be seen that for the possible cases Figure \ref{['subfig:zsolt_a']} to Figure \ref{['subfig:zsolt_c']} one of the worlds is dominating, whilst the other ones go towards $0$ as the training iterations increase. For the impossible case \ref{['subfig:zsolt_d']}, the impossible world is dominant. Note that we rank the first three outputs of the softmax according to the sum of their probability over the entire run, this allows us to identify the individual outputs.
  • Figure 2: Empirical evaluation for the traffic lights example using the semantic loss. We split the evaluation again into four parts, one for each of the possible configurations and report again the mean probability over runs on the respective part of the the test set during training. The plots clearly show that as training proceeds only the world that corresponds to the specific part of the test set is expressed. This also applies to the impossible case (Figure \ref{['subfig:nesy_d']}.
  • Figure 3: This is the empirical evaluation for the traffic lights example using the semantic loss without negative training examples van2024independence. For each case only the world when both lights are off is activated, even for the impossible case (Figure \ref{['subfig:emile_nesy_d']}) This means that the model does not learn.

Theorems & Definitions (5)

  • definition thmcounterdefinition
  • theorem thmcountertheorem
  • proof
  • theorem thmcountertheorem: Winner-Take-All zombori2024towards
  • definition thmcounterdefinition: Deterministic Bias