Table of Contents
Fetching ...

A Study of Rule Omission in Raven's Progressive Matrices

Binze Li

TL;DR

The study tackles whether modern AI can truly perform abstract reasoning on Raven's Progressive Matrices when trained with incomplete rule exposure. It systematically omits one or two rules in the Impartial-RAVEN dataset and evaluates a sequence-to-sequence transformer against two vision-based models (CoPINet and DCN), revealing that token-level success does not translate to correct solution accuracy on novel configurations. Findings show strong performance on familiar rules but sharp declines with unseen rules, highlighting generalization gaps and the fragility of current architectures. The work underscores the need for architectures that integrate symbolic structure with neural representations to achieve robust, extrapolative relational reasoning in RPM-like tasks.

Abstract

Analogical reasoning lies at the core of human cognition and remains a fundamental challenge for artificial intelligence. Raven's Progressive Matrices (RPM) serve as a widely used benchmark to assess abstract reasoning by requiring the inference of underlying structural rules. While many vision-based and language-based models have achieved success on RPM tasks, it remains unclear whether their performance reflects genuine reasoning ability or reliance on statistical shortcuts. This study investigates the generalization capacity of modern AI systems under conditions of incomplete training by deliberately omitting several structural rules during training. Both sequence-to-sequence transformer models and vision-based architectures such as CoPINet and the Dual-Contrast Network are evaluated on the Impartial-RAVEN (I-RAVEN) dataset. Experiments reveal that although transformers demonstrate strong performance on familiar rules, their accuracy declines sharply when faced with novel or omitted rules. Moreover, the gap between token-level accuracy and complete answer accuracy highlights fundamental limitations in current approaches. These findings provide new insights into the reasoning mechanisms underlying deep learning models and underscore the need for architectures that move beyond pattern recognition toward robust abstract reasoning.

A Study of Rule Omission in Raven's Progressive Matrices

TL;DR

The study tackles whether modern AI can truly perform abstract reasoning on Raven's Progressive Matrices when trained with incomplete rule exposure. It systematically omits one or two rules in the Impartial-RAVEN dataset and evaluates a sequence-to-sequence transformer against two vision-based models (CoPINet and DCN), revealing that token-level success does not translate to correct solution accuracy on novel configurations. Findings show strong performance on familiar rules but sharp declines with unseen rules, highlighting generalization gaps and the fragility of current architectures. The work underscores the need for architectures that integrate symbolic structure with neural representations to achieve robust, extrapolative relational reasoning in RPM-like tasks.

Abstract

Analogical reasoning lies at the core of human cognition and remains a fundamental challenge for artificial intelligence. Raven's Progressive Matrices (RPM) serve as a widely used benchmark to assess abstract reasoning by requiring the inference of underlying structural rules. While many vision-based and language-based models have achieved success on RPM tasks, it remains unclear whether their performance reflects genuine reasoning ability or reliance on statistical shortcuts. This study investigates the generalization capacity of modern AI systems under conditions of incomplete training by deliberately omitting several structural rules during training. Both sequence-to-sequence transformer models and vision-based architectures such as CoPINet and the Dual-Contrast Network are evaluated on the Impartial-RAVEN (I-RAVEN) dataset. Experiments reveal that although transformers demonstrate strong performance on familiar rules, their accuracy declines sharply when faced with novel or omitted rules. Moreover, the gap between token-level accuracy and complete answer accuracy highlights fundamental limitations in current approaches. These findings provide new insights into the reasoning mechanisms underlying deep learning models and underscore the need for architectures that move beyond pattern recognition toward robust abstract reasoning.

Paper Structure

This paper contains 18 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Model Performance Comparison Across Rule Removal Scenarios