Analyzing the Benefits of Prototypes for Semi-Supervised Category Learning
Liyi Zhang, Logan Nelson, Thomas L. Griffiths
TL;DR
The paper examines how prototype-based representations can benefit semi-supervised category learning by implementing a prototype-informed prior in a variational auto-encoder (VampPrior). Treating prototypes as latent cluster centers, the approach yields more clustered embeddings and improves downstream categorization on MNIST and CIFAR-10, even without labels during training. The authors show that the learned prototypes organize latent space into interpretable clusters and that prototype-based classification can approach supervised performance under certain conditions, especially when category boundaries are simpler or perturbed data is used. By linking cognitive models of categorization with deep generative models, the work highlights when abstract prototypes can enhance unsupervised representation learning and downstream tasks in naturalistic image domains.
Abstract
Categories can be represented at different levels of abstraction, from prototypes focused on the most typical members to remembering all observed exemplars of the category. These representations have been explored in the context of supervised learning, where stimuli are presented with known category labels. We examine the benefits of prototype-based representations in a less-studied domain: semi-supervised learning, where agents must form unsupervised representations of stimuli before receiving category labels. We study this problem in a Bayesian unsupervised learning model called a variational auto-encoder, and we draw on recent advances in machine learning to implement a prior that encourages the model to use abstract prototypes to represent data. We apply this approach to image datasets and show that forming prototypes can improve semi-supervised category learning. Additionally, we study the latent embeddings of the models and show that these prototypes allow the models to form clustered representations without supervision, contributing to their success in downstream categorization performance.
