Table of Contents
Fetching ...

An Efficient MLP-based Point-guided Segmentation Network for Ore Images with Ambiguous Boundary

Guodong Sun, Yuting Peng, Le Cheng, Mengya Xu, An Wang, Bo Wu, Hongliang Ren, Yang Zhang

TL;DR

OreNeXt introduces an efficient, MLP-based two-stage segmentation framework for ore images plagued by edge blur and constrained hardware. It combines a StoneMLP backbone for local edge features, a SparseFPN neck for balanced global-local fusion, and an Edge Guidance Loss to align prediction points with true instance edges. The approach achieves competitive accuracy (e.g., $AP_{50}^{box}=60.4$, $AP_{50}^{mask}=48.9$) at high speed (over $27$ FPS) with a small footprint ($\approx$ $73$ MB) on the ORE dataset, surpassing many CNN/Transformer baselines and SAM variants in efficiency. This work demonstrates practical potential for real-time ore segmentation in industrial settings and offers code for reproducibility.

Abstract

The precise segmentation of ore images is critical to the successful execution of the beneficiation process. Due to the homogeneous appearance of the ores, which leads to low contrast and unclear boundaries, accurate segmentation becomes challenging, and recognition becomes problematic. This paper proposes a lightweight framework based on Multi-Layer Perceptron (MLP), which focuses on solving the problem of edge burring. Specifically, we introduce a lightweight backbone better suited for efficiently extracting low-level features. Besides, we design a feature pyramid network consisting of two MLP structures that balance local and global information thus enhancing detection accuracy. Furthermore, we propose a novel loss function that guides the prediction points to match the instance edge points to achieve clear object boundaries. We have conducted extensive experiments to validate the efficacy of our proposed method. Our approach achieves a remarkable processing speed of over 27 frames per second (FPS) with a model size of only 73 MB. Moreover, our method delivers a consistently high level of accuracy, with impressive performance scores of 60.4 and 48.9 in~$AP_{50}^{box}$ and~$AP_{50}^{mask}$ respectively, as compared to the currently available state-of-the-art techniques, when tested on the ore image dataset. The source code will be released at \url{https://github.com/MVME-HBUT/ORENEXT}.

An Efficient MLP-based Point-guided Segmentation Network for Ore Images with Ambiguous Boundary

TL;DR

OreNeXt introduces an efficient, MLP-based two-stage segmentation framework for ore images plagued by edge blur and constrained hardware. It combines a StoneMLP backbone for local edge features, a SparseFPN neck for balanced global-local fusion, and an Edge Guidance Loss to align prediction points with true instance edges. The approach achieves competitive accuracy (e.g., , ) at high speed (over FPS) with a small footprint ( MB) on the ORE dataset, surpassing many CNN/Transformer baselines and SAM variants in efficiency. This work demonstrates practical potential for real-time ore segmentation in industrial settings and offers code for reproducibility.

Abstract

The precise segmentation of ore images is critical to the successful execution of the beneficiation process. Due to the homogeneous appearance of the ores, which leads to low contrast and unclear boundaries, accurate segmentation becomes challenging, and recognition becomes problematic. This paper proposes a lightweight framework based on Multi-Layer Perceptron (MLP), which focuses on solving the problem of edge burring. Specifically, we introduce a lightweight backbone better suited for efficiently extracting low-level features. Besides, we design a feature pyramid network consisting of two MLP structures that balance local and global information thus enhancing detection accuracy. Furthermore, we propose a novel loss function that guides the prediction points to match the instance edge points to achieve clear object boundaries. We have conducted extensive experiments to validate the efficacy of our proposed method. Our approach achieves a remarkable processing speed of over 27 frames per second (FPS) with a model size of only 73 MB. Moreover, our method delivers a consistently high level of accuracy, with impressive performance scores of 60.4 and 48.9 in~ and~ respectively, as compared to the currently available state-of-the-art techniques, when tested on the ore image dataset. The source code will be released at \url{https://github.com/MVME-HBUT/ORENEXT}.
Paper Structure (26 sections, 6 equations, 8 figures, 8 tables)

This paper contains 26 sections, 6 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Visualizations of the feature maps. The ore stacking in the input image makes the boundary difficult to distinguish. The edge features of the feature maps obtained by the baseline are blurred. The consistency instances with clearer edges can be obtained using our network.
  • Figure 2: A schematic overview of OreNeXt. The input image is fed into the lightweight backbone StoneMLP (Fig. \ref{['StoneMLP']}) to produce feature maps. StoneMLP captures local dependencies and extracts edge information through horizontal and vertical shift operations (Fig. \ref{['shift']}). Then, the feature maps enter the SparseFPN to generate multi-scale information-integrated feature maps. Our improved FPN structure adds two SparseMLP modules (Fig. \ref{['field']}), which are divided into three parallel branches for feature fusion through weighted summation. The addition of two sparsely connected SparseMLP modules allows both local and global features to be taken into account. Next, each layer feature map is computationally fused by the region proposal network (RPN) to obtain the fused feature map $S_i$ (the i-th feature maps). Finally, The point head uses interpolated features computed from the fine-grained feature of the CNN feature maps ($S_i$) and the coarse prediction mask for subdivision prediction.
  • Figure 3: Architecture of StoneMLP. The proposed StoneMLP block mainly includes Norm, Pixel Shift operation, MLP, channel projection, and residual connection.
  • Figure 4: The horizontal shift and vertical shift, where the arrows indicate the steps, and the number in each box is the index of the feature.
  • Figure 5: The SparseMLP block consists of three branches: two are responsible for mixing information along the horizontal and vertical directions, respectively, and the other path is a constant mapping. (b) shows the receptive fields generated by two consecutive SparseMLPs.
  • ...and 3 more figures