GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra

Mateusz Michalkiewicz; Anekha Sokhal; Tadeusz Michalkiewicz; Piotr Pawlikowski; Mahsa Baktashmotlagh; Varun Jampani; Guha Balakrishnan

GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra

Mateusz Michalkiewicz, Anekha Sokhal, Tadeusz Michalkiewicz, Piotr Pawlikowski, Mahsa Baktashmotlagh, Varun Jampani, Guha Balakrishnan

TL;DR

GIQ presents a first-of-its-kind benchmark for evaluating geometric reasoning in vision foundation models using a taxonomy-rich collection of polyhedra, including Platonic, Archimedean, Catalan solids, Johnson solids, stellations, and compounds. The dataset combines 224 synthetic and real-world polyhedra with ground-truth geometry and symmetry, enabling assessments across monocular 3D reconstruction, 3D symmetry detection, mental rotation, and zero-shot shape classification. Across experiments, state-of-the-art reconstruction methods fail to capture basic geometric properties, while encoders show some symmetry awareness but struggle with detailed geometric differentiation; frontier vision-language models exhibit substantial limitations in translating geometric understanding into accurate classifications. The work positions GIQ as a geometric litmus test and a practical platform to guide the development of robust, geometry-aware representations for spatial reasoning in AI systems.

Abstract

Modern monocular 3D reconstruction methods and vision-language models (VLMs) demonstrate impressive results on standard benchmarks, yet recent works cast doubt on their true understanding of geometric properties. We introduce GOQ, a comprehensive benchmark specifically designed to evaluate the geometric reasoning capabilities of vision and vision-language foundation models. GIQ comprises synthetic and real-world images and corresponding 3D meshes of diverse polyhedra covering varying levels of complexity and symmetry, from Platonic, Archimedean, Johnson, and Catalan solids to stellations and compound shapes. Through systematic experiments involving monocular 3D reconstruction, 3D symmetry detection, mental rotation tests, and zero-shot shape classification tasks, we reveal significant shortcomings in current models. State-of-the-art reconstruction algorithms trained on extensive 3D datasets struggle to reconstruct even basic geometric Platonic solids accurately. Next, although foundation models may be shown via linear and non-linear probing to capture specific 3D symmetry elements, they falter significantly in tasks requiring detailed geometric differentiation, such as mental rotation. Moreover, advanced vision-language assistants such as ChatGPT, Gemini and Claud exhibit remarkably low accuracy in interpreting basic shape properties such as face geometry, convexity, and compound structures of complex polyhedra. GIQ is publicly available at toomanymatts.github.io/giq-benchmark/, providing a structured platform to benchmark critical gaps in geometric intelligence and facilitate future progress in robust, geometry-aware representation learning.

GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra

TL;DR

Abstract

GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)