Table of Contents
Fetching ...

Fused attention mechanism-based ore sorting network

Junjiang Zhen, Bojun Xie

TL;DR

OreYOLO addresses the challenge of accurate, real‑time ore sorting in complex mineral environments by integrating Efficient Multi‑Scale Attention (EMA) into a lightweight YOLOv5 backbone and a progressive AFPN for robust multi‑level feature fusion. The approach delivers 3.458M parameters, 6.3 GFLOPs, and 79.07 FPS while achieving 86.6% mAP50‑95 on a gold/sulfide ore dataset, outperforming several high‑performing detectors. Ablation studies show EMA and AFPN provide meaningful accuracy gains with minimal parameter overhead, validating the design choices. The work enables efficient, edge‑deployable ore sorting with strong practical impact for mineral processing and mining operations, while noting the need for larger, cleaner datasets and further deployment optimizations for broader robustness.

Abstract

Deep learning has had a significant impact on the identification and classification of mineral resources, especially playing a key role in efficiently and accurately identifying different minerals, which is important for improving the efficiency and accuracy of mining. However, traditional ore sorting meth- ods often suffer from inefficiency and lack of accuracy, especially in complex mineral environments. To address these challenges, this study proposes a method called OreYOLO, which incorporates an attentional mechanism and a multi-scale feature fusion strategy, based on ore data from gold and sul- fide ores. By introducing the progressive feature pyramid structure into YOLOv5 and embedding the attention mechanism in the feature extraction module, the detection performance and accuracy of the model are greatly improved. In order to adapt to the diverse ore sorting scenarios and the deployment requirements of edge devices, the network structure is designed to be lightweight, which achieves a low number of parameters (3.458M) and computational complexity (6.3GFLOPs) while maintaining high accuracy (99.3% and 99.2%, respectively). In the experimental part, a target detection dataset containing 6000 images of gold and sulfuric iron ore is constructed for gold and sulfuric iron ore classification training, and several sets of comparison experiments are set up, including the YOLO series, EfficientDet, Faster-RCNN, and CenterNet, etc., and the experiments prove that OreYOLO outperforms the commonly used high-performance object detection of these architectures

Fused attention mechanism-based ore sorting network

TL;DR

OreYOLO addresses the challenge of accurate, real‑time ore sorting in complex mineral environments by integrating Efficient Multi‑Scale Attention (EMA) into a lightweight YOLOv5 backbone and a progressive AFPN for robust multi‑level feature fusion. The approach delivers 3.458M parameters, 6.3 GFLOPs, and 79.07 FPS while achieving 86.6% mAP50‑95 on a gold/sulfide ore dataset, outperforming several high‑performing detectors. Ablation studies show EMA and AFPN provide meaningful accuracy gains with minimal parameter overhead, validating the design choices. The work enables efficient, edge‑deployable ore sorting with strong practical impact for mineral processing and mining operations, while noting the need for larger, cleaner datasets and further deployment optimizations for broader robustness.

Abstract

Deep learning has had a significant impact on the identification and classification of mineral resources, especially playing a key role in efficiently and accurately identifying different minerals, which is important for improving the efficiency and accuracy of mining. However, traditional ore sorting meth- ods often suffer from inefficiency and lack of accuracy, especially in complex mineral environments. To address these challenges, this study proposes a method called OreYOLO, which incorporates an attentional mechanism and a multi-scale feature fusion strategy, based on ore data from gold and sul- fide ores. By introducing the progressive feature pyramid structure into YOLOv5 and embedding the attention mechanism in the feature extraction module, the detection performance and accuracy of the model are greatly improved. In order to adapt to the diverse ore sorting scenarios and the deployment requirements of edge devices, the network structure is designed to be lightweight, which achieves a low number of parameters (3.458M) and computational complexity (6.3GFLOPs) while maintaining high accuracy (99.3% and 99.2%, respectively). In the experimental part, a target detection dataset containing 6000 images of gold and sulfuric iron ore is constructed for gold and sulfuric iron ore classification training, and several sets of comparison experiments are set up, including the YOLO series, EfficientDet, Faster-RCNN, and CenterNet, etc., and the experiments prove that OreYOLO outperforms the commonly used high-performance object detection of these architectures
Paper Structure (19 sections, 11 equations, 9 figures, 7 tables)

This paper contains 19 sections, 11 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: (a) The CBS block consists of a convolutional layer, a normalization layer and a SiLu activation function. (b) The CSP3 block consists of a CBS block with residual paths and a Bottleneck structure
  • Figure 2: The structure of the Path Aggregation Network (PAN), consisting of Bottom-up feature fusion and Top-down feature fusion,where 20×20, 40×40, and 80×80 are feature map sizes
  • Figure 3: The network structure of YOLOv5 is divided into four main parts, the Input part for image input, the Backbone part for extracting ore features, the Neck part for fusing different layers of features and the YOLO Head part for prediction
  • Figure 4: EMA structure diagram, which consists of three parallel paths to extract the attention weight descriptors of the feature map group, where X Avg POOL denotes global pooling in the horizontal direction and Y Avg POOL denotes global pooling in the vertical direction
  • Figure 5: The architecture of the Asymptotic Feature Pyramid Network (AFPN).AFPN fuses two low-level features in the initial stage. Subsequent stages fuse high-level features, while the final stage adds top-level features during feature fusion. Black arrows indicate convolution and orange arrows indicate adaptive spatial fusion
  • ...and 4 more figures