Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification
Manan Shah, Yash Bhalgat
TL;DR
This reproducibility study focuses on CDUL, a CLIP-driven unsupervised method for multi-label image classification, by implementing a well-documented PyTorch pipeline and evaluating the two core claims: (i) the effectiveness of a global-local CLIP-based aggregation to generate pseudo labels, and (ii) the efficacy of a gradient-alignment training scheme that updates both the network and pseudo labels. The authors find that the global CLIP signal often outperforms the aggregation-based pseudo labels, and that gradient-alignment offers only modest gains when hyperparameters and update schedules are varied, with significant computational costs hindering full reproduction of the originals. On PASCAL VOC 2012, the best reproduced results show limited improvement in pseudo-label quality and a validation mAP that falls short of the original report, highlighting the practical challenges of CLIP-based pseudo-labeling at scale. The study emphasizes the need for public code, caching strategies, and detailed hyperparameter disclosures to enable robust reproducibility and broader adoption of CLIP-driven unsupervised multi-label learning.
Abstract
This report is a reproducibility study of the paper "CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification" (Abdelfattah et al, ICCV 2023). Our report makes the following contributions: (1) We provide a reproducible, well commented and open-sourced code implementation for the entire method specified in the original paper. (2) We try to verify the effectiveness of the novel aggregation strategy which uses the CLIP model to initialize the pseudo labels for the subsequent unsupervised multi-label image classification task. (3) We try to verify the effectiveness of the gradient-alignment training method specified in the original paper, which is used to update the network parameters and pseudo labels. The code can be found at https://github.com/cs-mshah/CDUL
