Table of Contents
Fetching ...

Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition

Lin Zuo, Kunshan Yang, Xianlong Tian, Kunbin He, Yongqi Ding, Mengmeng Jing

TL;DR

The Flexible Vision Graph Neural Network (FViG) is proposed to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects to maximize the channel-aware saliency and maximize the spatial-aware saliency.

Abstract

Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise from the lack of object saliency. To this end, we propose the Flexible Vision Graph Neural Network (FViG) to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects. Specifically, on one hand, we propose to maximize the channel-aware saliency by extracting the weight of neighboring nodes, which adapts to the shape and size variations in flexible objects. On the other hand, we maximize the spatial-aware saliency based on clustering to aggregate neighborhood information for the centroid nodes, which introduces local context information for the representation learning. To verify the performance of flexible objects recognition thoroughly, for the first time we propose the Flexible Dataset (FDA), which consists of various images of flexible objects collected from real-world scenarios or online. Extensive experiments evaluated on our Flexible Dataset demonstrate the effectiveness of our method on enhancing the discrimination of flexible objects.

Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition

TL;DR

The Flexible Vision Graph Neural Network (FViG) is proposed to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects to maximize the channel-aware saliency and maximize the spatial-aware saliency.

Abstract

Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise from the lack of object saliency. To this end, we propose the Flexible Vision Graph Neural Network (FViG) to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects. Specifically, on one hand, we propose to maximize the channel-aware saliency by extracting the weight of neighboring nodes, which adapts to the shape and size variations in flexible objects. On the other hand, we maximize the spatial-aware saliency based on clustering to aggregate neighborhood information for the centroid nodes, which introduces local context information for the representation learning. To verify the performance of flexible objects recognition thoroughly, for the first time we propose the Flexible Dataset (FDA), which consists of various images of flexible objects collected from real-world scenarios or online. Extensive experiments evaluated on our Flexible Dataset demonstrate the effectiveness of our method on enhancing the discrimination of flexible objects.

Paper Structure

This paper contains 23 sections, 11 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: (a) rigid objects. (b) flexible objects images from our proposed FDA. (c) fire images from FireNet dataset.
  • Figure 2: The top section describes the workflow of the proposed FViG, encompassing graph embedding, relation metrics, graph attention, graph generation and clustering, and graph reasoning learning. The bottom section details the graph construction process, with red blocks indicating central nodes and blue blocks indicating adjacent nodes. By selecting and clustering central nodes and their adjacent nodes, the model captures discriminative features and manifold structures within the image, thus improving the accuracy of flexible objects recognition.
  • Figure 3: Visualization of the constructed graph structure. For the images of smoke and water, we selected two central nodes from both the foreground and background. The patches represented by these chosen nodes are marked in red, and the nodes that eventually form neighboring relationships with them are marked in blue.
  • Figure 4: A comparative study of t-SNE visualizations is conducted for our FViG and ViG.
  • Figure 5: Classification performance metrics of nine categories in the dataset.
  • ...and 6 more figures