A Semantic-Enhanced Heterogeneous Graph Learning Method for Flexible Objects Recognition
Kunshan Yang, Wenwei Luo, Yuguo Hu, Jiafu Yan, Mengmeng Jing, Lin Zuo
TL;DR
The paper tackles the challenge of recognizing flexible objects whose shapes, translucency, and inter-class differences complicate traditional recognition paradigms. It introduces a semantic-enhanced heterogeneous graph learning framework that fuses global visual cues and local semantic information through an adaptive scanning module and a heterogeneous graph generator, enabling better cross-modal alignment. The method is validated on FDA, FSCW, CIFAR-100, and ImageNet-Hard, achieving competitive or superior accuracies (e.g., 80.50% on FDA and 89.98% on FSCW) and demonstrating robustness across hyperparameters and ablations. This approach advances flexible object recognition by integrating state-space-inspired semantic extraction with dynamic graph-based reasoning, offering improved generalization and practical impact for complex visual understanding.
Abstract
Flexible objects recognition remains a significant challenge due to its inherently diverse shapes and sizes, translucent attributes, and subtle inter-class differences. Graph-based models, such as graph convolution networks and graph vision models, are promising in flexible objects recognition due to their ability of capturing variable relations within the flexible objects. These methods, however, often focus on global visual relationships or fail to align semantic and visual information. To alleviate these limitations, we propose a semantic-enhanced heterogeneous graph learning method. First, an adaptive scanning module is employed to extract discriminative semantic context, facilitating the matching of flexible objects with varying shapes and sizes while aligning semantic and visual nodes to enhance cross-modal feature correlation. Second, a heterogeneous graph generation module aggregates global visual and local semantic node features, improving the recognition of flexible objects. Additionally, We introduce the FSCW, a large-scale flexible dataset curated from existing sources. We validate our method through extensive experiments on flexible datasets (FDA and FSCW), and challenge benchmarks (CIFAR-100 and ImageNet-Hard), demonstrating competitive performance.
