DIO: Refining Mutual Information and Causal Chain to Enhance Machine Abstract Reasoning Ability
Ruizhuo Song, Beiming Yuan
TL;DR
The paper tackles the abstract reasoning bottleneck in Raven’s Progressive Matrices (RPM) by introducing DIO, a causal-chain–driven architecture that aligns image features, progressive patterns, their consistency, and the final choice. It identifies that traditional mutual-information objectives bound $I(x_\alpha; \{x_i\}_{i=1}^8)$ too loosely and proposes three refinements—Brando (constructive hypothetical options), WORLD (Gaussian Mixture Model based feature distribution with targeted sampling), and DIEGO (metadata-based semantic alignment)—to tighten the bound and better reflect human reasoning. Empirically, these refinements yield substantial gains across RPM benchmarks (RAVEN, I-RAVEN, PGM), including open-ended generative RPM capabilities with DIO+WORLD, and demonstrate strong semantic disentanglement of reasoning-relevant features. The work provides a causal-information stack and practical design patterns for improving abstract reasoning in vision-language systems, with potential impact on both discriminative and generative RPM tasks and broader cognitive abilities in AI.
Abstract
Despite deep learning's broad success, its abstract-reasoning bottleneck persists. We tackle Raven's Progressive Matrices (RPM), the benchmark for pattern, reasoning and problem-solving intelligence. We model the full causal chain image $\rightarrow$ attributes $\rightarrow$ progressive patterns $\rightarrow$ consistency $\rightarrow$ answer and build the baseline DIO. Yet DIO's mutual-information lower-bound objective does not embed human logic: the bound is loose and statistic-based, ignoring causal subject-object links. We therefore present three refinements. 1) Brando introduces trainable negative options to tighten the variational bound. 2) WORLD replaces generation with a Gaussian-mixture feature model that supplies infinite, weighted negatives, further tightening the bound. 3) DIEGO adds metadata supervision to rectify the "attributes $\rightarrow$ patterns" semantic gap, aligning representations with human rules. These upgrades substantially boost discriminative RPM accuracy and, for the first time, let DIO generate valid answers in open-ended RPM. The work provides causal-driven design guidelines, objective-refinement strategies and cross-modal insights for abstract-reasoning research.
