V-LoL: A Diagnostic Dataset for Visual Logical Learning

Lukas Helff; Wolfgang Stammer; Hikaru Shindo; Devendra Singh Dhami; Kristian Kersting

V-LoL: A Diagnostic Dataset for Visual Logical Learning

Lukas Helff, Wolfgang Stammer, Hikaru Shindo, Devendra Singh Dhami, Kristian Kersting

TL;DR

This work introduces the first instantiation of V-LoL, V-LoL-Train, - a visual rendition of a classic benchmark in symbolic AI, the Michalski train problem, and proposes the diagnostic visual logical learning dataset, V-LoL, that seamlessly combines visual and logical challenges.

Abstract

Despite the successes of recent developments in visual AI, different shortcomings still exist; from missing exact logical reasoning, to abstract generalization abilities, to understanding complex and noisy scenes. Unfortunately, existing benchmarks, were not designed to capture more than a few of these aspects. Whereas deep learning datasets focus on visually complex data but simple visual reasoning tasks, inductive logic datasets involve complex logical learning tasks, however, lack the visual component. To address this, we propose the diagnostic visual logical learning dataset, V-LoL, that seamlessly combines visual and logical challenges. Notably, we introduce the first instantiation of V-LoL, V-LoL-Train, - a visual rendition of a classic benchmark in symbolic AI, the Michalski train problem. By incorporating intricate visual scenes and flexible logical reasoning tasks within a versatile framework, V-LoL-Train provides a platform for investigating a wide range of visual logical learning challenges. We evaluate a variety of AI systems including traditional symbolic AI, neural AI, as well as neuro-symbolic AI. Our evaluations demonstrate that even SOTA AI faces difficulties in dealing with visual logical learning challenges, highlighting unique advantages and limitations of each methodology. Overall, V-LoL opens up new avenues for understanding and enhancing current abilities in visual logical learning for AI systems.

V-LoL: A Diagnostic Dataset for Visual Logical Learning

TL;DR

Abstract

Paper Structure (34 sections, 4 equations, 14 figures, 6 tables, 1 algorithm)

This paper contains 34 sections, 4 equations, 14 figures, 6 tables, 1 algorithm.

Introduction
V-LoL: Merging Logic and Vision
V-LoL Generation
V-LoL and Related Datasets
Experiments: AI Systems on the V-LoL challenges
V-LoL Challenges
Discussion
Impact
Conclusion
V-LoL dataset
Details on Michalski Train Semantics
Attribute Constraints
Prolog and FOL Notation of the Classification rules
Reasoning Properties for Logic Rules
Details on V-LoL Train Semantics
...and 19 more sections

Figures (14)

Figure 1: V-LoL: a diagnostic dataset that allows to test a variety of visual logical learning challenges. (I) The modular generation process of V-LoLconsists of sampling symbolic train representations (i.e., train cars and their attributes) from a pre-defined distribution. Via a logical class rule the class affiliation of each train sample is determined. The 3D visual representations are selected and finally rendered. The flexibility and versatility of V-LoL allows that the logic component (II) as well as the visual component (III) can easily be exchanged. This way one can perform different visual-logical learning tests e.g., concerning abstract generalization abilities (IV), targeted test-time interventions (V) or evaluations on varying dataset sizes (VI).
Figure 2: Right: a detailed overview of the train (top) and block (bottom) representations. Left: illustrations of the visual representations of individual objects and attributes used for rendering the symbolic representations into images.
Figure 3: Different background scenes and rich scene information provided with each V-LoLsample. The four background options consist of (left): a base, desert, desert with sky, and fisheye scene. The scene information (right) provided with each sample contains: the original image sample, object bounding boxes, pixel-level object segmentation maps and a corresponding depth map.
Figure 4: Visual Perception (Challenge 1). In this challenge, we compare the performance of various symbolic, neural, and neuro-symbolic AI models on different visual representations, i.e., the V-LoLand the V-LoLdatasets. Each bar depicts the average test accuracy along with a 95% confidence interval derived from a 5-fold cross-validation. We do not observe a strong influence of the different visual representations on the model's performances.
Figure 5: Logical Reasoning (Challenge 2). In this challenge, we compare the performance of various symbolic, neural, and neuro-symbolic AI models on the different "Theoryx", "Numerical", and "Complex" logic rules. Each bar depicts the average test accuracy along with a 95% confidence interval derived from a 5-fold cross-validation. Failed runs are denoted by an $*$. As the complexity of the logic rules increases (from the left to the right plot), a discernible decline in performance across all evaluated methods is observed.
...and 9 more figures

V-LoL: A Diagnostic Dataset for Visual Logical Learning

TL;DR

Abstract

V-LoL: A Diagnostic Dataset for Visual Logical Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (14)