Table of Contents
Fetching ...

A-I-RAVEN and I-RAVEN-Mesh: Two New Benchmarks for Abstract Visual Reasoning

Mikołaj Małkiński, Jacek Mańdziuk

TL;DR

The paper tackles generalization and knowledge reuse in abstract visual reasoning by introducing two benchmarks: A-I-RAVEN, which tests rule-generalization across held-out attributes in RPMs, and I-RAVEN-Mesh, which overlays a mesh component to study progressive knowledge acquisition via TL. It evaluates 13 AVR models and finds meaningful generalization and transfer gaps, with the hardest regimes arising from held-out attribute pairs and non-Constant rules. The work demonstrates that current models struggle to generalize across diverse regimes and benefits from TL in transfer tasks, underscoring the need for new methods that better capture abstract reasoning. Overall, these benchmarks offer a rigorous, resource-efficient platform to advance AVR research and catalyze improvements in generalization and transfer learning.

Abstract

We study generalization and knowledge reuse capabilities of deep neural networks in the domain of abstract visual reasoning (AVR), employing Raven's Progressive Matrices (RPMs), a recognized benchmark task for assessing AVR abilities. Two knowledge transfer scenarios referring to the I-RAVEN dataset are investigated. Firstly, inspired by generalization assessment capabilities of the PGM dataset and popularity of I-RAVEN, we introduce Attributeless-I-RAVEN (A-I-RAVEN), a benchmark with 10 generalization regimes that allow to systematically test generalization of abstract rules applied to held-out attributes at various levels of complexity (primary and extended regimes). In contrast to PGM, A-I-RAVEN features compositionality, a variety of figure configurations, and does not require substantial computational resources. Secondly, we construct I-RAVEN-Mesh, a dataset that enriches RPMs with a novel component structure comprising line-based patterns, facilitating assessment of progressive knowledge acquisition in transfer learning setting. We evaluate 13 strong models from the AVR literature on the introduced datasets, revealing their specific shortcomings in generalization and knowledge transfer.

A-I-RAVEN and I-RAVEN-Mesh: Two New Benchmarks for Abstract Visual Reasoning

TL;DR

The paper tackles generalization and knowledge reuse in abstract visual reasoning by introducing two benchmarks: A-I-RAVEN, which tests rule-generalization across held-out attributes in RPMs, and I-RAVEN-Mesh, which overlays a mesh component to study progressive knowledge acquisition via TL. It evaluates 13 AVR models and finds meaningful generalization and transfer gaps, with the hardest regimes arising from held-out attribute pairs and non-Constant rules. The work demonstrates that current models struggle to generalize across diverse regimes and benefits from TL in transfer tasks, underscoring the need for new methods that better capture abstract reasoning. Overall, these benchmarks offer a rigorous, resource-efficient platform to advance AVR research and catalyze improvements in generalization and transfer learning.

Abstract

We study generalization and knowledge reuse capabilities of deep neural networks in the domain of abstract visual reasoning (AVR), employing Raven's Progressive Matrices (RPMs), a recognized benchmark task for assessing AVR abilities. Two knowledge transfer scenarios referring to the I-RAVEN dataset are investigated. Firstly, inspired by generalization assessment capabilities of the PGM dataset and popularity of I-RAVEN, we introduce Attributeless-I-RAVEN (A-I-RAVEN), a benchmark with 10 generalization regimes that allow to systematically test generalization of abstract rules applied to held-out attributes at various levels of complexity (primary and extended regimes). In contrast to PGM, A-I-RAVEN features compositionality, a variety of figure configurations, and does not require substantial computational resources. Secondly, we construct I-RAVEN-Mesh, a dataset that enriches RPMs with a novel component structure comprising line-based patterns, facilitating assessment of progressive knowledge acquisition in transfer learning setting. We evaluate 13 strong models from the AVR literature on the introduced datasets, revealing their specific shortcomings in generalization and knowledge transfer.
Paper Structure (24 sections, 2 equations, 10 figures, 27 tables)

This paper contains 24 sections, 2 equations, 10 figures, 27 tables.

Figures (10)

  • Figure 1: RPM example. The correct answer is A.
  • Figure 2: A-I-RAVEN. Left: Matrices from the A/Position regime belonging to the 2$\times$2 Grid configuration. In (a), object position is constant across rows, while in (b) object numerosity is governed by Distribute Three. Right: Matrices from the A/Color regime belonging to the Left-Right configuration. In (c), object color is constant across rows in left and right image parts, while in (d) it's governed by Progression. Correct answers are marked in a green dotted border. Please refer to Appendix \ref{['sec:examples']} for examples from other generalization regimes.
  • Figure 3: I-RAVEN-Mesh. Matrices with the Position attribute of the mesh component governed by all applicable rules. For the sake of readability, we present examples belonging to the Center configuration. (a) Line position is constant in each row. (b) The line pattern displayed in the first column is rotated by $90$ degrees in subsequent columns. (c) The union set operator applied to the first and the second column produces line positions in the third column. (d) Each row contains lines arranged in one out of three available patterns. Correct answers are marked in a green dotted border. Please refer to Appendix \ref{['sec:examples']} for examples concerning the Number attribute.
  • Figure 4: Dataset difficulty. Performance of top-3 models on test and validation splits. Numerical values refer to SCL scores.
  • Figure 5: Transfer learning. Mean and standard deviation of test accuracy on I-RAVEN-Mesh across three random seeds. Models were trained in two setups: 1) from scratch on I-RAVEN-Mesh with variable sample size; 2) pre-trained on full I-RAVEN and fine-tuned on I-RAVEN-Mesh with variable sample size. Results for setups 1) and 2) are shown below and above the plot lines, resp.
  • ...and 5 more figures