From Images to Decisions: Assistive Computer Vision for Non-Metallic Content Estimation in Scrap Metal

Daniil Storonkin; Ilia Dziub; Maksim Golyadkin; Ilya Makarov

From Images to Decisions: Assistive Computer Vision for Non-Metallic Content Estimation in Scrap Metal

Daniil Storonkin, Ilia Dziub, Maksim Golyadkin, Ilya Makarov

TL;DR

This work tackles the challenge of non-metallic inclusions in scrap metal, which impact energy use, emissions, and safety, by delivering an assistive computer vision pipeline that estimates railcar-level contamination and classifies scrap types from unloading images. It introduces two modeling strategies: multi-instance learning (MIL) to predict contamination from temporally ordered per-layer frames, and multi-task learning (MTL) to jointly perform contamination regression and scrap grade classification, with Swin Transformer backbones outperforming CNNs. The best MIL result achieves $MAE=0.27$ and $R^2=0.83$, while the Swin-MTL setup reaches $MAE=0.36$ with $F1=0.79$, demonstrating high accuracy and practical viability for near real-time deployment. The system integrates into production via a double-blind annotation workflow, an active-learning loop, and a versioned inference service, reducing subjective variability and enabling safer, more reliable melt planning and scrap acceptance.

Abstract

Scrap quality directly affects energy use, emissions, and safety in steelmaking. Today, the share of non-metallic inclusions (contamination) is judged visually by inspectors - an approach that is subjective and hazardous due to dust and moving machinery. We present an assistive computer vision pipeline that estimates contamination (per percent) from images captured during railcar unloading and also classifies scrap type. The method formulates contamination assessment as a regression task at the railcar level and leverages sequential data through multi-instance learning (MIL) and multi-task learning (MTL). Best results include MAE 0.27 and R2 0.83 by MIL; and an MTL setup reaches MAE 0.36 with F1 0.79 for scrap class. Also we present the system in near real time within the acceptance workflow: magnet/railcar detection segments temporal layers, a versioned inference service produces railcar-level estimates with confidence scores, and results are reviewed by operators with structured overrides; corrections and uncertain cases feed an active-learning loop for continual improvement. The pipeline reduces subjective variability, improves human safety, and enables integration into acceptance and melt-planning workflows.

From Images to Decisions: Assistive Computer Vision for Non-Metallic Content Estimation in Scrap Metal

TL;DR

and

, while the Swin-MTL setup reaches

with

, demonstrating high accuracy and practical viability for near real-time deployment. The system integrates into production via a double-blind annotation workflow, an active-learning loop, and a versioned inference service, reducing subjective variability and enabling safer, more reliable melt planning and scrap acceptance.

Abstract

Paper Structure (13 sections, 5 figures, 4 tables, 2 algorithms)

This paper contains 13 sections, 5 figures, 4 tables, 2 algorithms.

Introduction
Related work
Methods
Multi Instance Learning for percentage of scrap metal contamination at railcar level
Multi Task Learning (percent of contamination, scrap metal classificaton) for scrap metal
Experimental setup
Implementation details
Human-in-the-Loop Model Integration
Double-blind annotation system
Application
Discussion
Conclusion
Acknowledgments

Figures (5)

Figure 1: Background: The determination of non-metallic inclusions in scrap metal depends on the inspector’s subjective judgment, which reduces the quality of production planning. The chart shows that assessments of the same railcar by different inspectors vary significantly. Class & Contamination: During a scrap metal assessment, the inspector handles several tasks at once: determining the percentage of contamination in the scrap and its grade. Overall approach: railcar arrives with scrap metal at the unloading point under the camera. The detector analyzes the location of the magnet relative to the railcar. There used to be an inspector in the unloading area who was exposed to airborne dust and other hazards.
Figure 2: Differences between variants of deep learning architectures. Multi-instance learning approach takes a temporal data (a bag) and, based on multiple instances, assigns a score to the railcar. With multi-task learning, we solve several tasks for scrap metal assessment in a single inference by using multiple heads.
Figure 3: Grad-CAM visualisations for CNN vs transformer backbones. Transformers (right) focus tightly on scrap pieces, ignoring dust clouds and background, which explains their superior regression accuracy.
Figure 4: End‑to‑end human‑in‑the‑loop system. Double‑blind annotation pipeline with automated quality checks (blur, extraneous objects, incomplete unloading), triple labeling per railcar, aggregation with dispersion-based adjudication, and audit logging. Production application architecture—magnet/railcar detection and IoU‑based layer (grab) segmentation; MIL pooling with MTL heads for contamination regression and grade classification served via a versioned ML service; results persisted to DB/S3, surfaced to operators for review/correction, and fed back through an active‑learning loop with model registry and experiment tracking.
Figure 5: With the start of work in the annotation system, inspectors began to more carefully accept scrap metal with minimal discrepancies.

From Images to Decisions: Assistive Computer Vision for Non-Metallic Content Estimation in Scrap Metal

TL;DR

Abstract

From Images to Decisions: Assistive Computer Vision for Non-Metallic Content Estimation in Scrap Metal

Authors

TL;DR

Abstract

Table of Contents

Figures (5)