From Images to Decisions: Assistive Computer Vision for Non-Metallic Content Estimation in Scrap Metal
Daniil Storonkin, Ilia Dziub, Maksim Golyadkin, Ilya Makarov
TL;DR
This work tackles the challenge of non-metallic inclusions in scrap metal, which impact energy use, emissions, and safety, by delivering an assistive computer vision pipeline that estimates railcar-level contamination and classifies scrap types from unloading images. It introduces two modeling strategies: multi-instance learning (MIL) to predict contamination from temporally ordered per-layer frames, and multi-task learning (MTL) to jointly perform contamination regression and scrap grade classification, with Swin Transformer backbones outperforming CNNs. The best MIL result achieves $MAE=0.27$ and $R^2=0.83$, while the Swin-MTL setup reaches $MAE=0.36$ with $F1=0.79$, demonstrating high accuracy and practical viability for near real-time deployment. The system integrates into production via a double-blind annotation workflow, an active-learning loop, and a versioned inference service, reducing subjective variability and enabling safer, more reliable melt planning and scrap acceptance.
Abstract
Scrap quality directly affects energy use, emissions, and safety in steelmaking. Today, the share of non-metallic inclusions (contamination) is judged visually by inspectors - an approach that is subjective and hazardous due to dust and moving machinery. We present an assistive computer vision pipeline that estimates contamination (per percent) from images captured during railcar unloading and also classifies scrap type. The method formulates contamination assessment as a regression task at the railcar level and leverages sequential data through multi-instance learning (MIL) and multi-task learning (MTL). Best results include MAE 0.27 and R2 0.83 by MIL; and an MTL setup reaches MAE 0.36 with F1 0.79 for scrap class. Also we present the system in near real time within the acceptance workflow: magnet/railcar detection segments temporal layers, a versioned inference service produces railcar-level estimates with confidence scores, and results are reviewed by operators with structured overrides; corrections and uncertain cases feed an active-learning loop for continual improvement. The pipeline reduces subjective variability, improves human safety, and enables integration into acceptance and melt-planning workflows.
