Table of Contents
Fetching ...

A Two-Stage Detection-Tracking Framework for Stable Apple Quality Inspection in Dense Conveyor-Belt Environments

Keonvin Park, Aditya Pal, Jin Hong Mok

TL;DR

This work addresses the challenge of temporal stability in automated apple quality inspection under dense conveyor-belt conditions. It proposes a two-stage framework that combines an orchard-trained YOLOv8 detector, ByteTrack for persistent object identities, and a ResNet18 defect classifier with track-level aggregation to stabilize per-object decisions. Video-level metrics, including $DefectRatio = \frac{N_{defect\ tracks}}{N_{total\ tracks}}$ and $TemporalStability = 1 - \frac{Number\ of\ label\ changes\ per\ track}{Track\ length}$, quantify robustness and enable practical, track-aware evaluation. Results suggest that integrating tracking significantly improves decision stability over frame-wise inference, supporting the practical deployment of automated fruit grading in industrial settings.

Abstract

Industrial fruit inspection systems must operate reliably under dense multi-object interactions and continuous motion, yet most existing works evaluate detection or classification at the image level without ensuring temporal stability in video streams. We present a two-stage detection-tracking framework for stable multi-apple quality inspection in conveyor-belt environments. An orchard-trained YOLOv8 model performs apple localization, followed by ByteTrack multi-object tracking to maintain persistent identities. A ResNet18 defect classifier, fine-tuned on a healthy-defective fruit dataset, is applied to cropped apple regions. Track-level aggregation is introduced to enforce temporal consistency and reduce prediction oscillation across frames. We define video-level industrial metrics such as track-level defect ratio and temporal consistency to evaluate system robustness under realistic processing conditions. Results demonstrate improved stability compared to frame-wise inference, suggesting that integrating tracking is essential for practical automated fruit grading systems.

A Two-Stage Detection-Tracking Framework for Stable Apple Quality Inspection in Dense Conveyor-Belt Environments

TL;DR

This work addresses the challenge of temporal stability in automated apple quality inspection under dense conveyor-belt conditions. It proposes a two-stage framework that combines an orchard-trained YOLOv8 detector, ByteTrack for persistent object identities, and a ResNet18 defect classifier with track-level aggregation to stabilize per-object decisions. Video-level metrics, including and , quantify robustness and enable practical, track-aware evaluation. Results suggest that integrating tracking significantly improves decision stability over frame-wise inference, supporting the practical deployment of automated fruit grading in industrial settings.

Abstract

Industrial fruit inspection systems must operate reliably under dense multi-object interactions and continuous motion, yet most existing works evaluate detection or classification at the image level without ensuring temporal stability in video streams. We present a two-stage detection-tracking framework for stable multi-apple quality inspection in conveyor-belt environments. An orchard-trained YOLOv8 model performs apple localization, followed by ByteTrack multi-object tracking to maintain persistent identities. A ResNet18 defect classifier, fine-tuned on a healthy-defective fruit dataset, is applied to cropped apple regions. Track-level aggregation is introduced to enforce temporal consistency and reduce prediction oscillation across frames. We define video-level industrial metrics such as track-level defect ratio and temporal consistency to evaluate system robustness under realistic processing conditions. Results demonstrate improved stability compared to frame-wise inference, suggesting that integrating tracking is essential for practical automated fruit grading systems.
Paper Structure (32 sections, 7 equations, 1 algorithm)