Table of Contents
Fetching ...

Improving Interpretability and Accuracy in Neuro-Symbolic Rule Extraction Using Class-Specific Sparse Filters

Parth Padalkar, Jaeseong Lee, Shiyi Wei, Gopal Gupta

TL;DR

The paper tackles the interpretability-accuracy trade-off in neuro-symbolic rule extraction for image classification by isolating post-training binarization as a key source of information loss. It introduces a novel class-specific sparsity loss that guides CNNs to produce sparse, near-binary filter activations during training, enabling higher-fidelity rule extraction via the NeSyFOLD/FOLD-SE-M pipeline. Across multiple training strategies and datasets, the approach yields a 9% average accuracy uplift and a 53% reduction in rule-set size, bringing NeSy performance within about $3$–$4\%$ of the original CNN while maintaining interpretability. The work demonstrates that carefully designed sparsity objectives can make interpretable neuro-symbolic models competitive with black-box CNNs, and discusses practical guidance and future extensions to other architectures like Vision Transformers.

Abstract

There has been significant focus on creating neuro-symbolic models for interpretable image classification using Convolutional Neural Networks (CNNs). These methods aim to replace the CNN with a neuro-symbolic model consisting of the CNN, which is used as a feature extractor, and an interpretable rule-set extracted from the CNN itself. While these approaches provide interpretability through the extracted rule-set, they often compromise accuracy compared to the original CNN model. In this paper, we identify the root cause of this accuracy loss as the post-training binarization of filter activations to extract the rule-set. To address this, we propose a novel sparsity loss function that enables class-specific filter binarization during CNN training, thus minimizing information loss when extracting the rule-set. We evaluate several training strategies with our novel sparsity loss, analyzing their effectiveness and providing guidance on their appropriate use. Notably, we set a new benchmark, achieving a 9% improvement in accuracy and a 53% reduction in rule-set size on average, compared to the previous SOTA, while coming within 3% of the original CNN's accuracy. This highlights the significant potential of interpretable neuro-symbolic models as viable alternatives to black-box CNNs.

Improving Interpretability and Accuracy in Neuro-Symbolic Rule Extraction Using Class-Specific Sparse Filters

TL;DR

The paper tackles the interpretability-accuracy trade-off in neuro-symbolic rule extraction for image classification by isolating post-training binarization as a key source of information loss. It introduces a novel class-specific sparsity loss that guides CNNs to produce sparse, near-binary filter activations during training, enabling higher-fidelity rule extraction via the NeSyFOLD/FOLD-SE-M pipeline. Across multiple training strategies and datasets, the approach yields a 9% average accuracy uplift and a 53% reduction in rule-set size, bringing NeSy performance within about of the original CNN while maintaining interpretability. The work demonstrates that carefully designed sparsity objectives can make interpretable neuro-symbolic models competitive with black-box CNNs, and discusses practical guidance and future extensions to other architectures like Vision Transformers.

Abstract

There has been significant focus on creating neuro-symbolic models for interpretable image classification using Convolutional Neural Networks (CNNs). These methods aim to replace the CNN with a neuro-symbolic model consisting of the CNN, which is used as a feature extractor, and an interpretable rule-set extracted from the CNN itself. While these approaches provide interpretability through the extracted rule-set, they often compromise accuracy compared to the original CNN model. In this paper, we identify the root cause of this accuracy loss as the post-training binarization of filter activations to extract the rule-set. To address this, we propose a novel sparsity loss function that enables class-specific filter binarization during CNN training, thus minimizing information loss when extracting the rule-set. We evaluate several training strategies with our novel sparsity loss, analyzing their effectiveness and providing guidance on their appropriate use. Notably, we set a new benchmark, achieving a 9% improvement in accuracy and a 53% reduction in rule-set size on average, compared to the previous SOTA, while coming within 3% of the original CNN's accuracy. This highlights the significant potential of interpretable neuro-symbolic models as viable alternatives to black-box CNNs.

Paper Structure

This paper contains 12 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The NeSyFOLD Framework
  • Figure 2: Activation maps of the top filters for the training strategies TS1 - TS5 for P3.1 dataset. Each row represents the $top-3$ images for the top filter, per class, per training strategy.
  • Figure 3: Raw rule-set produced via TS3 for P3.1 dataset (top). Labelled rule-set produced via the NeSyFOLD toolkit (middle) and justification provided by s(CASP) ASP engine for an image (bottom).