CarcassFormer: An End-to-end Transformer-based Framework for Simultaneous Localization, Segmentation and Classification of Poultry Carcass Defect
Minh Tran, Sang Truong, Arthur F. A. Fernandes, Michael T. Kidd, Ngan Le
TL;DR
CarcassFormer addresses automated quality assessment of poultry carcasses by unifying localization, segmentation, and defect classification in an end-to-end Transformer framework. The method employs a four-component design (Backbone, Pixel Decoder, Mask-Attention Transformer Decoder, and Instance Mask/Classification Predictor) with multi-scale features and deformable attention to produce high-fidelity masks and defect labels. On the CarcassDefect dataset, CarcassFormer consistently surpasses CNN-based and Transformer-based baselines across detection, segmentation, and defect classification metrics (AP, AP@50, AP@75, AP@95) for both single- and multi-carcass frames, while maintaining competitive computational efficiency. The work demonstrates practical impact for real-world poultry processing by enabling accurate, scalable carcass quality assessment and highlights avenues for future enhancements such as video-based tracking and finer-grained defect taxonomy.
Abstract
In the food industry, assessing the quality of poultry carcasses during processing is a crucial step. This study proposes an effective approach for automating the assessment of carcass quality without requiring skilled labor or inspector involvement. The proposed system is based on machine learning (ML) and computer vision (CV) techniques, enabling automated defect detection and carcass quality assessment. To this end, an end-to-end framework called CarcassFormer is introduced. It is built upon a Transformer-based architecture designed to effectively extract visual representations while simultaneously detecting, segmenting, and classifying poultry carcass defects. Our proposed framework is capable of analyzing imperfections resulting from production and transport welfare issues, as well as processing plant stunner, scalder, picker, and other equipment malfunctions. To benchmark the framework, a dataset of 7,321 images was initially acquired, which contained both single and multiple carcasses per image. In this study, the performance of the CarcassFormer system is compared with other state-of-the-art (SOTA) approaches for both classification, detection, and segmentation tasks. Through extensive quantitative experiments, our framework consistently outperforms existing methods, demonstrating remarkable improvements across various evaluation metrics such as AP, AP@50, and AP@75. Furthermore, the qualitative results highlight the strengths of CarcassFormer in capturing fine details, including feathers, and accurately localizing and segmenting carcasses with high precision. To facilitate further research and collaboration, the pre-trained model and source code of CarcassFormer is available for research purposes at: \url{https://github.com/UARK-AICV/CarcassFormer}.
