Visual Knowledge in the Big Model Era: Retrospect and Prospect

Wenguan Wang; Yi Yang; Yunhe Pan

Visual Knowledge in the Big Model Era: Retrospect and Prospect

Wenguan Wang, Yi Yang, Yunhe Pan

TL;DR

This paper surveys visual knowledge as a cognitively grounded, explicit representation of visual concepts, relations, operations, and reasoning, and situates its relevance in the era of large foundation models. It traces roots in cognitive psychology, reviews pre-big-model advances across the four components, and discusses how big models can both benefit from and contribute to visual knowledge. Key insights include the potential for prototype-based concepts to improve transparency, the role of visual relations and operations in structured understanding, and the need for neuro-symbolic and knowledge-extraction approaches to address hallucination and forgetting. Overall, the work advocates a symbiotic integration of visual knowledge with large models to enhance interpretability, robustness, and generalization in next-generation AI systems.

Abstract

Visual knowledge is a new form of knowledge representation that can encapsulate visual concepts and their relations in a succinct, comprehensive, and interpretable manner, with a deep root in cognitive psychology. As the knowledge about the visual world has been identified as an indispensable component of human cognition and intelligence, visual knowledge is poised to have a pivotal role in establishing machine intelligence. With the recent advance of Artificial Intelligence (AI) techniques, large AI models (or foundation models) have emerged as a potent tool capable of extracting versatile patterns from broad data as implicit knowledge, and abstracting them into an outrageous amount of numeric parameters. To pave the way for creating visual knowledge empowered AI machines in this coming wave, we present a timely review that investigates the origins and development of visual knowledge in the pre-big model era, and accentuates the opportunities and unique role of visual knowledge in the big model era.

Visual Knowledge in the Big Model Era: Retrospect and Prospect

TL;DR

Abstract

Paper Structure (18 sections, 2 equations, 14 figures)

This paper contains 18 sections, 2 equations, 14 figures.

Introduction
Visual knowledge: origins and definitions
Origins
Definitions
Visual concept
Visual relation
Visual operation
Visual reasoning
$_{\!\!\!}$Visual$_{\!}$ knowledge$_{\!}$ in$_{\!}$ the$_{\!}$ pre$_{\!}$ big$_{\!}$ model$_{\!\!}$ era: retrospect
Visual knowledge: visual concept
Visual knowledge: visual relation
Visual knowledge: visual operation
Visual knowledge: visual reasoning
Discussion
$_{\!\!\!}$Visual$_{\!}$ knowledge$_{\!}$ in$_{\!}$ the$_{\!}$ big$_{\!}$ model$_{\!}$ era:$_{\!\!\!}$ prospect
...and 3 more sections

Figures (14)

Figure 1: The overall structure of this article.
Figure 2: Illustration of prototype-and scope-based visual concept representation. Here we show three visual concepts, namely pear, apple, and watermelon.
Figure 3: Illustration of geometric relations.
Figure 4: Illustration of 13 base temporal relations defined in Allen's interval algebra allen1983maintaining.
Figure 5: Illustration of semantic relations.
...and 9 more figures

Visual Knowledge in the Big Model Era: Retrospect and Prospect

TL;DR

Abstract

Visual Knowledge in the Big Model Era: Retrospect and Prospect

Authors

TL;DR

Abstract

Table of Contents

Figures (14)