Table of Contents
Fetching ...

Seeing the Intangible: Survey of Image Classification into High-Level and Abstract Categories

Delfina Sol Martinez Pandiani, Valentina Presutti

TL;DR

This survey enhances the understanding of high-level visual reasoning in CV and lays the groundwork for future research endeavors, focusing particularly on Abstract Concepts in automatic image classification.

Abstract

The field of Computer Vision (CV) is increasingly shifting towards ``high-level'' visual sensemaking tasks, yet the exact nature of these tasks remains unclear and tacit. This survey paper addresses this ambiguity by systematically reviewing research on high-level visual understanding, focusing particularly on Abstract Concepts (ACs) in automatic image classification. Our survey contributes in three main ways: Firstly, it clarifies the tacit understanding of high-level semantics in CV through a multidisciplinary analysis, and categorization into distinct clusters, including commonsense, emotional, aesthetic, and inductive interpretative semantics. Secondly, it identifies and categorizes computer vision tasks associated with high-level visual sensemaking, offering insights into the diverse research areas within this domain. Lastly, it examines how abstract concepts such as values and ideologies are handled in CV, revealing challenges and opportunities in AC-based image classification. Notably, our survey of AC image classification tasks highlights persistent challenges, such as the limited efficacy of massive datasets and the importance of integrating supplementary information and mid-level features. We emphasize the growing relevance of hybrid AI systems in addressing the multifaceted nature of AC image classification tasks. Overall, this survey enhances our understanding of high-level visual reasoning in CV and lays the groundwork for future research endeavors.

Seeing the Intangible: Survey of Image Classification into High-Level and Abstract Categories

TL;DR

This survey enhances the understanding of high-level visual reasoning in CV and lays the groundwork for future research endeavors, focusing particularly on Abstract Concepts in automatic image classification.

Abstract

The field of Computer Vision (CV) is increasingly shifting towards ``high-level'' visual sensemaking tasks, yet the exact nature of these tasks remains unclear and tacit. This survey paper addresses this ambiguity by systematically reviewing research on high-level visual understanding, focusing particularly on Abstract Concepts (ACs) in automatic image classification. Our survey contributes in three main ways: Firstly, it clarifies the tacit understanding of high-level semantics in CV through a multidisciplinary analysis, and categorization into distinct clusters, including commonsense, emotional, aesthetic, and inductive interpretative semantics. Secondly, it identifies and categorizes computer vision tasks associated with high-level visual sensemaking, offering insights into the diverse research areas within this domain. Lastly, it examines how abstract concepts such as values and ideologies are handled in CV, revealing challenges and opportunities in AC-based image classification. Notably, our survey of AC image classification tasks highlights persistent challenges, such as the limited efficacy of massive datasets and the importance of integrating supplementary information and mid-level features. We emphasize the growing relevance of hybrid AI systems in addressing the multifaceted nature of AC image classification tasks. Overall, this survey enhances our understanding of high-level visual reasoning in CV and lays the groundwork for future research endeavors.
Paper Structure (35 sections, 4 figures, 6 tables)

This paper contains 35 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The three tiers of the visual semantics hierarchy. Visual understanding is often depicted as a multi-layered process, revealing three distinct levels of semantics. The low-level involves raw or elemental features, while the mid-level encompasses individual objects, persons, and regions. In contrast, the high-level remains less defined and explored.
  • Figure 2: Tip of the iceberg: a deeper characterization of the top level of the visual semantic pyramid. Drawing from a multidisciplinary exploration of semantic entities associated with this upper semantic layer, we have identified four distinct clusters of knowledge.
  • Figure 3: Computer Vision tasks that deal with "high level semantics" or "high level visual understanding", which have been mapped also to the previous multidisciplinary characterization of high level semantics. Circled in red are the tasks that were found to implicitly or explicitly deal with AC detection.
  • Figure 4: Two inflection points, (2012) and (2017), that seem to correlate with the increasing interest in CV tasks dealing with high-level visual semantics.