HydraMix: Multi-Image Feature Mixing for Small Data Image Classification
Christoph Reinders, Frederik Schubert, Bodo Rosenhahn
TL;DR
HydraMix tackles the data paucity barrier in image classification by learning to compose new images from multiple class-consistent exemplars. Through an encoder–decoder architecture with a segmentation-guided mixing module operating in feature space, and trained with reconstruction, perceptual, and adversarial losses, HydraMix generates diverse, high-quality samples without pretraining. It achieves state-of-the-art performance on ciFAIR-10, STL-10, and ciFAIR-100 in few-shot regimes, outperforms competing mixing methods, and introduces CLIP Synset Entropy to quantify augmentation diversity. The approach demonstrates strong practical impact for privacy-conscious, low-data applications and provides a foundation for robust cross-domain generalization and integration with automated augmentation.
Abstract
Training deep neural networks requires datasets with a large number of annotated examples. The collection and annotation of these datasets is not only extremely expensive but also faces legal and privacy problems. These factors are a significant limitation for many real-world applications. To address this, we introduce HydraMix, a novel architecture that generates new image compositions by mixing multiple different images from the same class. HydraMix learns the fusion of the content of various images guided by a segmentation-based mixing mask in feature space and is optimized via a combination of unsupervised and adversarial training. Our data augmentation scheme allows the creation of models trained from scratch on very small datasets. We conduct extensive experiments on ciFAIR-10, STL-10, and ciFAIR-100. Additionally, we introduce a novel text-image metric to assess the generality of the augmented datasets. Our results show that HydraMix outperforms existing state-of-the-art methods for image classification on small datasets.
