Table of Contents
Fetching ...

GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs

Kalliopi Basioti, Pritish Sahu, Qingze Tony Liu, Zihao Xu, Hao Wang, Vladimir Pavlovic

TL;DR

GenVP advances abstract visual reasoning by modeling RPM generation and solving within a hierarchical HVAE framework augmented with a Mixture of Experts for robust rule inference. It introduces a dual contrastive learning scheme—global (cross-puzzle) and local (cross-candidate)—to strengthen rule representation and generalization, and enables generation of complete RPM matrices from abstract rules. Across five AVR datasets and challenging out-of-distribution scenarios, GenVP achieves state-of-the-art puzzle-solving accuracy and demonstrates strong generalization to unseen attributes and large solution spaces, while also generating coherent, rule-consistent RPMs. The combination of generative capability, robust rule disentanglement, and scalable inference positions GenVP as a versatile tool for both RPM solving and AI creativity in puzzle design and high-level visual reasoning.

Abstract

Raven's Progressive Matrices (RPMs) is an established benchmark to examine the ability to perform high-level abstract visual reasoning (AVR). Despite the current success of algorithms that solve this task, humans can generalize beyond a given puzzle and create new puzzles given a set of rules, whereas machines remain locked in solving a fixed puzzle from a curated choice list. We propose Generative Visual Puzzles (GenVP), a framework to model the entire RPM generation process, a substantially more challenging task. Our model's capability spans from generating multiple solutions for one specific problem prompt to creating complete new puzzles out of the desired set of rules. Experiments on five different datasets indicate that GenVP achieves state-of-the-art (SOTA) performance both in puzzle-solving accuracy and out-of-distribution (OOD) generalization in 22 OOD scenarios. Compared to SOTA generative approaches, which struggle to solve RPMs when the feasible solution space increases, GenVP efficiently generalizes to these challenging setups. Moreover, our model demonstrates the ability to produce a wide range of complete RPMs given a set of abstract rules by effectively capturing the relationships between abstract rules and visual object properties.

GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs

TL;DR

GenVP advances abstract visual reasoning by modeling RPM generation and solving within a hierarchical HVAE framework augmented with a Mixture of Experts for robust rule inference. It introduces a dual contrastive learning scheme—global (cross-puzzle) and local (cross-candidate)—to strengthen rule representation and generalization, and enables generation of complete RPM matrices from abstract rules. Across five AVR datasets and challenging out-of-distribution scenarios, GenVP achieves state-of-the-art puzzle-solving accuracy and demonstrates strong generalization to unseen attributes and large solution spaces, while also generating coherent, rule-consistent RPMs. The combination of generative capability, robust rule disentanglement, and scalable inference positions GenVP as a versatile tool for both RPM solving and AI creativity in puzzle design and high-level visual reasoning.

Abstract

Raven's Progressive Matrices (RPMs) is an established benchmark to examine the ability to perform high-level abstract visual reasoning (AVR). Despite the current success of algorithms that solve this task, humans can generalize beyond a given puzzle and create new puzzles given a set of rules, whereas machines remain locked in solving a fixed puzzle from a curated choice list. We propose Generative Visual Puzzles (GenVP), a framework to model the entire RPM generation process, a substantially more challenging task. Our model's capability spans from generating multiple solutions for one specific problem prompt to creating complete new puzzles out of the desired set of rules. Experiments on five different datasets indicate that GenVP achieves state-of-the-art (SOTA) performance both in puzzle-solving accuracy and out-of-distribution (OOD) generalization in 22 OOD scenarios. Compared to SOTA generative approaches, which struggle to solve RPMs when the feasible solution space increases, GenVP efficiently generalizes to these challenging setups. Moreover, our model demonstrates the ability to produce a wide range of complete RPMs given a set of abstract rules by effectively capturing the relationships between abstract rules and visual object properties.

Paper Structure

This paper contains 61 sections, 9 equations, 8 figures, 17 tables.

Figures (8)

  • Figure 1: Generative and Inference Graphical Models of GenVP. During inference, we are given a complete, valid puzzle $\mathbf{X}$ and our task is to infer all intermediate random variables $\mathbf{Z},\mathbf{Z}_o, \mathbf{Z}_{\bar{o}},\mathbf{Z}_r$ and predict its rules $\mathbf{R}$ using our Mixture of Experts (MoE) strategy. During generation, given an abstract set of rules $\mathbf{R}$, we generate a complete puzzle $\mathbf{X}$ following the desired rules. We note that during generation, we don't need the MoE module, but we directly generate $\mathbf{Z}_r$ from the given set of rules.
  • Figure 2: Generated RPM matrices by GenVP. Top: Original RPM matrices; Middle: GenVP generations; Bottom: Rules
  • Figure 3: Multiple solutions for the bottom-right panel. Top: RPM rules, Middle: Context Matrix, Bottom: Solutions. Green-marked image is the dataset solution, followed by GenVP answers.
  • Figure 4: Rule Prediction performance for different (rule, attribute) pairs for RPM generated puzzles by GenVP trained on RAVEN-based datasets. The RPM puzzles using the GenVP generative graphical model (from rules to complete RPM puzzles).
  • Figure 5: Generated RPM matrices by GenVP for distribute nine ($3\times3$) configuration. Top: Rules; Bottom: Three different sampled generated puzzles.
  • ...and 3 more figures