AttriGen: Automated Multi-Attribute Annotation for Blood Cell Datasets
Walid Houmaidi, Youssef Sabiri, Fatima Zahra Iguenfer, Amine Abouaomar
TL;DR
AttriGen tackles the bottleneck of fine-grained multi-attribute annotation in blood cell imaging by coupling a CNN-based cell-type classifier with a Vision Transformer-based attribute recognizer. The framework leverages two complementary datasets (PBC for cell types and WBCAtt for attributes) to produce a 12-head output that mirrors clinical assessment. It reports a high-level performance, including a 98.83% accuracy on PBC and a 94.62% global average accuracy on WBCAtt, while enabling scalable, automated annotation that dramatically reduces labeling time. This work offers a generalizable paradigm for automated, interpretable attribute annotation in medical imaging and points toward extensions with active learning and broader applicability to pathological morphologies.
Abstract
We introduce AttriGen, a novel framework for automated, fine-grained multi-attribute annotation in computer vision, with a particular focus on cell microscopy where multi-attribute classification remains underrepresented compared to traditional cell type categorization. Using two complementary datasets: the Peripheral Blood Cell (PBC) dataset containing eight distinct cell types and the WBC Attribute Dataset (WBCAtt) that contains their corresponding 11 morphological attributes, we propose a dual-model architecture that combines a CNN for cell type classification, as well as a Vision Transformer (ViT) for multi-attribute classification achieving a new benchmark of 94.62\% accuracy. Our experiments demonstrate that AttriGen significantly enhances model interpretability and offers substantial time and cost efficiency relative to conventional full-scale human annotation. Thus, our framework establishes a new paradigm that can be extended to other computer vision classification tasks by effectively automating the expansion of multi-attribute labels.
