Table of Contents
Fetching ...

Predictive Reasoning with Augmented Anomaly Contrastive Learning for Compositional Visual Relations

Chengtai Li, Yuting He, Jianfeng Ren, Ruibin Bai, Yitian Zhao, Heng Yu, Xudong Jiang

TL;DR

A predict-and-verify paradigm is introduced for rule-based reasoning, in which a series of Predictive Anomaly Reasoning Blocks (PARBs) iteratively leverage features from three out of the four images to predict those of the remaining one.

Abstract

While visual reasoning for simple analogies has received significant attention, compositional visual relations (CVR) remain relatively unexplored due to their greater complexity. To solve CVR tasks, we propose Predictive Reasoning with Augmented Anomaly Contrastive Learning (PR-A$^2$CL), \ie, to identify an outlier image given three other images that follow the same compositional rules. To address the challenge of modelling abundant compositional rules, an Augmented Anomaly Contrastive Learning is designed to distil discriminative and generalizable features by maximizing similarity among normal instances while minimizing similarity between normal and anomalous outliers. More importantly, a predict-and-verify paradigm is introduced for rule-based reasoning, in which a series of Predictive Anomaly Reasoning Blocks (PARBs) iteratively leverage features from three out of the four images to predict those of the remaining one. Throughout the subsequent verification stage, the PARBs progressively pinpoint the specific discrepancies attributable to the underlying rules. Experimental results on SVRT, CVR and MC$^2$R datasets show that PR-A$^2$CL significantly outperforms state-of-the-art reasoning models.

Predictive Reasoning with Augmented Anomaly Contrastive Learning for Compositional Visual Relations

TL;DR

A predict-and-verify paradigm is introduced for rule-based reasoning, in which a series of Predictive Anomaly Reasoning Blocks (PARBs) iteratively leverage features from three out of the four images to predict those of the remaining one.

Abstract

While visual reasoning for simple analogies has received significant attention, compositional visual relations (CVR) remain relatively unexplored due to their greater complexity. To solve CVR tasks, we propose Predictive Reasoning with Augmented Anomaly Contrastive Learning (PR-ACL), \ie, to identify an outlier image given three other images that follow the same compositional rules. To address the challenge of modelling abundant compositional rules, an Augmented Anomaly Contrastive Learning is designed to distil discriminative and generalizable features by maximizing similarity among normal instances while minimizing similarity between normal and anomalous outliers. More importantly, a predict-and-verify paradigm is introduced for rule-based reasoning, in which a series of Predictive Anomaly Reasoning Blocks (PARBs) iteratively leverage features from three out of the four images to predict those of the remaining one. Throughout the subsequent verification stage, the PARBs progressively pinpoint the specific discrepancies attributable to the underlying rules. Experimental results on SVRT, CVR and MCR datasets show that PR-ACL significantly outperforms state-of-the-art reasoning models.
Paper Structure (30 sections, 13 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 30 sections, 13 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Left: A sample compositional rule in the CVR dataset. Right: Selecting an outlier from four images is converted to four predict-and-verify problems.
  • Figure 2: Proposed PR-A$^2$CL comprises two modules: 1) A Perception Module with Augmented Anomaly Contrastive Learning (A$^2$CL), generating weak and strong augmented views, encoding them via ResNet-50, and hence improving feature discriminability and generalization. 2) A Predictive Anomaly Reasoning Module (PARM) built with stacked PARBs. Each block employs a predict-and-verify mechanism, which predicts the target from context features and verifies it against the target to infer latent compositional rules.
  • Figure 3: Comparison with DBCR li2025dbcr on SVRT fleuret2011comparing using 1k samples per task. PR-A$^2$CL excels across most tasks, whereas DBCR struggles on several.
  • Figure 4: Comparison of reasoning accuracy on specific compositional rules. PR-A$^2$CL outperforms DBCR li2025dbcr for almost all rules.
  • Figure 5: t-SNE visualization of feature representations across successive PARBs, showing increased cluster compactness with deeper PARBs.
  • ...and 1 more figures