A-I-RAVEN and I-RAVEN-Mesh: Two New Benchmarks for Abstract Visual Reasoning
Mikołaj Małkiński, Jacek Mańdziuk
TL;DR
The paper tackles generalization and knowledge reuse in abstract visual reasoning by introducing two benchmarks: A-I-RAVEN, which tests rule-generalization across held-out attributes in RPMs, and I-RAVEN-Mesh, which overlays a mesh component to study progressive knowledge acquisition via TL. It evaluates 13 AVR models and finds meaningful generalization and transfer gaps, with the hardest regimes arising from held-out attribute pairs and non-Constant rules. The work demonstrates that current models struggle to generalize across diverse regimes and benefits from TL in transfer tasks, underscoring the need for new methods that better capture abstract reasoning. Overall, these benchmarks offer a rigorous, resource-efficient platform to advance AVR research and catalyze improvements in generalization and transfer learning.
Abstract
We study generalization and knowledge reuse capabilities of deep neural networks in the domain of abstract visual reasoning (AVR), employing Raven's Progressive Matrices (RPMs), a recognized benchmark task for assessing AVR abilities. Two knowledge transfer scenarios referring to the I-RAVEN dataset are investigated. Firstly, inspired by generalization assessment capabilities of the PGM dataset and popularity of I-RAVEN, we introduce Attributeless-I-RAVEN (A-I-RAVEN), a benchmark with 10 generalization regimes that allow to systematically test generalization of abstract rules applied to held-out attributes at various levels of complexity (primary and extended regimes). In contrast to PGM, A-I-RAVEN features compositionality, a variety of figure configurations, and does not require substantial computational resources. Secondly, we construct I-RAVEN-Mesh, a dataset that enriches RPMs with a novel component structure comprising line-based patterns, facilitating assessment of progressive knowledge acquisition in transfer learning setting. We evaluate 13 strong models from the AVR literature on the introduced datasets, revealing their specific shortcomings in generalization and knowledge transfer.
