Table of Contents
Fetching ...

Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification

Zhaorui Tan, Xi Yang, Qiufeng Wang, Anh Nguyen, Kaizhu Huang

TL;DR

The paper introduces L-Reg, a logic-informed, sample-based regularization term for visual classification that enforces minimal, disentangled semantic supports to improve generalization. By grounding a formal logical framework and deriving an entropy-based objective, L-Reg reduces classifier complexity and yields interpretable, class-specific minimal features, enhancing performance in multi-domain generalization and generalized category discovery settings. Theoretical analysis plus extensive experiments show consistent gains across diverse datasets and even non-vision tasks like circuit congestion prediction, while also outlining limitations and avenues for improving feature independence and layer selection. Overall, L-Reg offers a practical, plug-and-play regularization that improves robustness to domain shifts and unknown categories with interpretable semantics.

Abstract

Vision models excel in image classification but struggle to generalize to unseen data, such as classifying images from unseen domains or discovering novel categories. In this paper, we explore the relationship between logical reasoning and deep learning generalization in visual classification. A logical regularization termed L-Reg is derived which bridges a logical analysis framework to image classification. Our work reveals that L-Reg reduces the complexity of the model in terms of the feature distribution and classifier weights. Specifically, we unveil the interpretability brought by L-Reg, as it enables the model to extract the salient features, such as faces to persons, for classification. Theoretical analysis and experiments demonstrate that L-Reg enhances generalization across various scenarios, including multi-domain generalization and generalized category discovery. In complex real-world scenarios where images span unknown classes and unseen domains, L-Reg consistently improves generalization, highlighting its practical efficacy.

Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification

TL;DR

The paper introduces L-Reg, a logic-informed, sample-based regularization term for visual classification that enforces minimal, disentangled semantic supports to improve generalization. By grounding a formal logical framework and deriving an entropy-based objective, L-Reg reduces classifier complexity and yields interpretable, class-specific minimal features, enhancing performance in multi-domain generalization and generalized category discovery settings. Theoretical analysis plus extensive experiments show consistent gains across diverse datasets and even non-vision tasks like circuit congestion prediction, while also outlining limitations and avenues for improving feature independence and layer selection. Overall, L-Reg offers a practical, plug-and-play regularization that improves robustness to domain shifts and unknown categories with interpretable semantics.

Abstract

Vision models excel in image classification but struggle to generalize to unseen data, such as classifying images from unseen domains or discovering novel categories. In this paper, we explore the relationship between logical reasoning and deep learning generalization in visual classification. A logical regularization termed L-Reg is derived which bridges a logical analysis framework to image classification. Our work reveals that L-Reg reduces the complexity of the model in terms of the feature distribution and classifier weights. Specifically, we unveil the interpretability brought by L-Reg, as it enables the model to extract the salient features, such as faces to persons, for classification. Theoretical analysis and experiments demonstrate that L-Reg enhances generalization across various scenarios, including multi-domain generalization and generalized category discovery. In complex real-world scenarios where images span unknown classes and unseen domains, L-Reg consistently improves generalization, highlighting its practical efficacy.
Paper Structure (25 sections, 4 theorems, 26 equations, 12 figures, 16 tables)

This paper contains 25 sections, 4 theorems, 26 equations, 12 figures, 16 tables.

Key Result

Proposition 4.1

Assume the gap across all domains is well minimized. Let $f^*$ denote the target model that generalizes to the data $X_u$ from the unseen domain with the lowest complexity. For a model ${f}^{R}_{(X_s, Y_s)}, {f}_{(X_s, Y_s)}$ trained under the data-shift generalization setting (i.e., $(X_s, Y_s)$ is

Figures (12)

  • Figure 1: GradCAM selvaraju2017grad visualizations for the unknown class 'person' across seen and unseen domains of the GMDG baseline with $L_2$ regularization that is trained without and with L-Reg, respectively. Both experiments share the same hyper-parameters, except the latter uses the L-Reg.
  • Figure 2: Diagrams of different generalization settings in visual classification tasks.
  • Figure 3: Visualizations of classifiers' weights form models trained using GMDG on PACS dataset without and with L-Reg under mDG+GCD setting, respectively. Both experiments share the same hyper-parameters using Regnety-16g backbone, except the latter uses additional L-Reg.
  • Figure 4: Visualizations of latent features form models trained using GMDG on PACS dataset without and with L-Reg under mDD+GCD setting using RegNetY-16G backbone, respectively.
  • Figure 5: GradCAM visualizations of GMDG trained without and with L-Reg. The seen, unseen domains and known, unknown classes are denoted.
  • ...and 7 more figures

Theorems & Definitions (12)

  • Definition 2.1: Generalization loss
  • Definition 3.1
  • Definition 3.2: Semantic support
  • Proposition 4.1: Effectiveness of L-Reg in enhancing data-shift generalization.
  • Proposition 4.2: L-Reg improves target-shift generalization
  • proof
  • Definition B.1
  • Definition B.2: General logic
  • Proposition C.1: L-Reg reduces the complexity of the model, promoting data-shift generalization performance.
  • proof
  • ...and 2 more