Table of Contents
Fetching ...

Square Superpixel Generation and Representation Learning via Granular Ball Computing

Shuyin Xia, Meng Yang, Dawei Dai, Fan Chen, Shilin Zhao, Junwei Han, Xinbo Gao, Guoyin Wang, Wen Lu

Abstract

Superpixels provide a compact region-based representation that preserves object boundaries and local structures, and have therefore been widely used in a variety of vision tasks to reduce computational cost. However, most existing superpixel algorithms produce irregularly shaped regions, which are not well aligned with regular operators such as convolutions. Consequently, superpixels are often treated as an offline preprocessing step, limiting parallel implementation and hindering end-to-end optimization within deep learning pipelines. Motivated by the adaptive representation and coverage property of granular-ball computing, we develop a square superpixel generation approach. Specifically, we approximate superpixels using multi-scale square blocks to avoid the computational and implementation difficulties induced by irregular shapes, enabling efficient parallel processing and learnable feature extraction. For each block, a purity score is computed based on pixel-intensity similarity, and high-quality blocks are selected accordingly. The resulting square superpixels can be readily integrated as graph nodes in graph neural networks (GNNs) or as tokens in Vision Transformers (ViTs), facilitating multi-scale information aggregation and structured visual representation. Experimental results on downstream tasks demonstrate consistent performance improvements, validating the effectiveness of the proposed method.

Square Superpixel Generation and Representation Learning via Granular Ball Computing

Abstract

Superpixels provide a compact region-based representation that preserves object boundaries and local structures, and have therefore been widely used in a variety of vision tasks to reduce computational cost. However, most existing superpixel algorithms produce irregularly shaped regions, which are not well aligned with regular operators such as convolutions. Consequently, superpixels are often treated as an offline preprocessing step, limiting parallel implementation and hindering end-to-end optimization within deep learning pipelines. Motivated by the adaptive representation and coverage property of granular-ball computing, we develop a square superpixel generation approach. Specifically, we approximate superpixels using multi-scale square blocks to avoid the computational and implementation difficulties induced by irregular shapes, enabling efficient parallel processing and learnable feature extraction. For each block, a purity score is computed based on pixel-intensity similarity, and high-quality blocks are selected accordingly. The resulting square superpixels can be readily integrated as graph nodes in graph neural networks (GNNs) or as tokens in Vision Transformers (ViTs), facilitating multi-scale information aggregation and structured visual representation. Experimental results on downstream tasks demonstrate consistent performance improvements, validating the effectiveness of the proposed method.

Paper Structure

This paper contains 16 sections, 9 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: In contrast with SLIC, the first line depicts the input image, the second showcases the superpixel blocks generated by our method, and the last line displays those generated by the SLIC method.
  • Figure 2: Overview of our superpixel generation approach: (a) Module for multi-scale superpixels; (b) Integration of the module into ViG for image classification; (c) Integration of the module into ViT for image-text retrieval; (d) Integration of the module into ViT for object detection.
  • Figure 3: Our Method vs. SLIC on MNIST and CIFAR-10. Top: Our Method; Bottom: SLIC Method.
  • Figure 4: Visualization of multi-granularity token selection on COCO image. Left: Original image with objects at various scales (cows and pedestrians). Right: Selected tokens represented by square blocks of varying sizes. Our method adaptively assigns larger blocks to homogeneous background regions and smaller blocks to object-dense areas, effectively reducing tokens while preserving critical information for detection.