AEye: A Visualization Tool for Image Datasets
Florian Grötschla, Luca A. Lanzendörfer, Marco Calzavara, Roger Wattenhofer
TL;DR
AEye addresses the challenge of understanding large image datasets by offering a scalable visualization that preserves semantic structure through CLIP-based embeddings. The approach combines a 2D projection via UMAP, a layered tiling scheme to display representative images, and semantic search with captions powered by CLIP and LLaVA, all backed by a vector database such as Milvus. Key contributions include the layered tiling method for scalable overview, the integration of text and image semantic search, and the ability to generate AI captions to contextualize images. The tool facilitates bias and imbalance detection and supports data-driven dataset curation, with open-source code and a demonstration site for deployment.
Abstract
Image datasets serve as the foundation for machine learning models in computer vision, significantly influencing model capabilities, performance, and biases alongside architectural considerations. Therefore, understanding the composition and distribution of these datasets has become increasingly crucial. To address the need for intuitive exploration of these datasets, we propose AEye, an extensible and scalable visualization tool tailored to image datasets. AEye utilizes a contrastively trained model to embed images into semantically meaningful high-dimensional representations, facilitating data clustering and organization. To visualize the high-dimensional representations, we project them onto a two-dimensional plane and arrange images in layers so users can seamlessly navigate and explore them interactively. AEye facilitates semantic search functionalities for both text and image queries, enabling users to search for content. We open-source the codebase for AEye, and provide a simple configuration to add datasets.
