GENIUS: Generative Fluid Intelligence Evaluation Suite
Ruichuan An, Sihan Yang, Ziyu Guo, Wei Dai, Zijun Shen, Haodong Li, Renrui Zhang, Xinyu Wei, Guopeng Li, Wenshan Wu, Wentao Zhang
TL;DR
GENIUS formalizes Generative Fluid Intelligence ($GFI$) within the CHC framework and presents the first multimodal benchmark to quantify dynamic, rule-driven visual generation in novel contexts. It operationalizes $GFI$ into three primitives and evaluates 12 models across 510 expert-curated samples using a hybrid, model-judge pipeline with Rule Compliance, Visual Consistency, and Aesthetic Quality metrics. The study reveals a substantial gap between state-of-the-art models and true fluid intelligence, driven by an execution gap where priors overpower context, and shows that attention misalignment during inference contributes to failures. As a remedy, the authors propose a training-free Attention Adjustment Mechanism that improves performance by reweighting context signals, suggesting a viable path toward more robust, context-aware generation without additional training. GENIUS thus acts as a rigorous standard to push multimodal models from crystallized recall toward adaptive, reasoning-driven generalization, with dataset and code released for community use.
Abstract
Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess $\textit{Crystallized Intelligence}$, which relies on recalling accumulated knowledge and learned schemas. This focus overlooks $\textit{Generative Fluid Intelligence (GFI)}$: the capacity to induce patterns, reason through constraints, and adapt to novel scenarios on the fly. To rigorously assess this capability, we introduce $\textbf{GENIUS}$ ($\textbf{GEN}$ Fluid $\textbf{I}$ntelligence Eval$\textbf{U}$ation $\textbf{S}$uite). We formalize $\textit{GFI}$ as a synthesis of three primitives. These include $\textit{Inducing Implicit Patterns}$ (e.g., inferring personalized visual preferences), $\textit{Executing Ad-hoc Constraints}$ (e.g., visualizing abstract metaphors), and $\textit{Adapting to Contextual Knowledge}$ (e.g., simulating counter-intuitive physics). Collectively, these primitives challenge models to solve problems grounded entirely in the immediate context. Our systematic evaluation of 12 representative models reveals significant performance deficits in these tasks. Crucially, our diagnostic analysis disentangles these failure modes. It demonstrates that deficits stem from limited context comprehension rather than insufficient intrinsic generative capability. To bridge this gap, we propose a training-free attention intervention strategy. Ultimately, $\textbf{GENIUS}$ establishes a rigorous standard for $\textit{GFI}$, guiding the field beyond knowledge utilization toward dynamic, general-purpose reasoning. Our dataset and code will be released at: $\href{https://github.com/arctanxarc/GENIUS}{https://github.com/arctanxarc/GENIUS}$.
