Visual Knowledge in the Big Model Era: Retrospect and Prospect
Wenguan Wang, Yi Yang, Yunhe Pan
TL;DR
This paper surveys visual knowledge as a cognitively grounded, explicit representation of visual concepts, relations, operations, and reasoning, and situates its relevance in the era of large foundation models. It traces roots in cognitive psychology, reviews pre-big-model advances across the four components, and discusses how big models can both benefit from and contribute to visual knowledge. Key insights include the potential for prototype-based concepts to improve transparency, the role of visual relations and operations in structured understanding, and the need for neuro-symbolic and knowledge-extraction approaches to address hallucination and forgetting. Overall, the work advocates a symbiotic integration of visual knowledge with large models to enhance interpretability, robustness, and generalization in next-generation AI systems.
Abstract
Visual knowledge is a new form of knowledge representation that can encapsulate visual concepts and their relations in a succinct, comprehensive, and interpretable manner, with a deep root in cognitive psychology. As the knowledge about the visual world has been identified as an indispensable component of human cognition and intelligence, visual knowledge is poised to have a pivotal role in establishing machine intelligence. With the recent advance of Artificial Intelligence (AI) techniques, large AI models (or foundation models) have emerged as a potent tool capable of extracting versatile patterns from broad data as implicit knowledge, and abstracting them into an outrageous amount of numeric parameters. To pave the way for creating visual knowledge empowered AI machines in this coming wave, we present a timely review that investigates the origins and development of visual knowledge in the pre-big model era, and accentuates the opportunities and unique role of visual knowledge in the big model era.
