Mitigating Long-Tail Bias in HOI Detection via Adaptive Diversity Cache
Yuqiu Jiang, Xiaozhen Qiao, Tianyu Mei, Haojian Huang, Yifan Chen, Ye Zheng, Zhe Sun
TL;DR
The paper tackles long-tail bias in HOI detection by introducing Adaptive Diversity Cache (ADC), a training-free, plug-in module that builds class-specific caches of high-confidence, diverse features during inference and uses frequency-aware capacity allocation to boost rare-interaction predictions without retraining. ADC comprises Confidence-Diversity Joint Cache Selection (CJCS) and Frequency-Aware Cache Adaptation (FACA), which together expand reference representations and augment predictions via an affinity-based retrieval mechanism. Empirical results on HICO-DET and V-COCO show substantial improvements for rare categories (+8.57% mAP on rare and +4.39% on full on HICO-DET) and positive transfer across multiple baselines, including zero-shot-capable models, with competitive results on V-COCO. Overall, ADC demonstrates that test-time caching and adaptive augmentation can calibrate HOI predictions under long-tail distributions, offering a scalable, training-free approach with potential applicability to other long-tail structured tasks.
Abstract
Human-Object Interaction (HOI) detection is a fundamental task in computer vision, empowering machines to comprehend human-object relationships in diverse real-world scenarios. Recent advances in VLMs have significantly improved HOI detection by leveraging rich cross-modal representations. However, most existing VLM-based approaches rely heavily on additional training or prompt tuning, resulting in substantial computational overhead and limited scalability, particularly in long-tailed scenarios where rare interactions are severely underrepresented. In this paper, we propose the Adaptive Diversity Cache (ADC) module, a novel training-free and plug-and-play mechanism designed to mitigate long-tail bias in HOI detection. ADC constructs class-specific caches that accumulate high-confidence and diverse feature representations during inference. The method incorporates frequency-aware cache adaptation that favors rare categories and is designed to enable robust prediction calibration without requiring additional training or fine-tuning. Extensive experiments on HICO-DET and V-COCO datasets show that ADC consistently improves existing HOI detectors, achieving up to +8.57\% mAP gain on rare categories and +4.39\% on the full dataset, demonstrating its effectiveness in mitigating long-tail bias while preserving overall performance.
