MozzaVID: Mozzarella Volumetric Image Dataset
Pawel Tomasz Pieta, Peter Winkel Rasmussen, Anders Bjorholm Dahl, Jeppe Revall Frisvad, Siavash Arjomand Bigdeli, Carsten Gundlach, Anders Nymark Christensen
TL;DR
MozzaVID addresses the scarcity and comparability of volumetric benchmarks by introducing a large, well-curated X-ray CT dataset of mozzarella microstructure for classifying 25 cheese types and 149 samples across three downsampled resolutions. The authors benchmark multiple 3D CNN and transformer architectures, compare 2D slices versus full 3D volumes, and perform ablations on granularity and dimensionality, complemented by learned-representation analyses (UMAP) linked to metadata. They demonstrate that 3D representations yield higher accuracy, with the Large configuration achieving near-perfect coarse-grained ($0.993$) and strong fine-grained ($0.935$) performance, while revealing meaningful latent structure that reflects cheese recipes and processing. MozzaVID thus provides a scalable, domain-specific benchmark for volumetric DL, highlighting the need for models tailored to 3D data and enabling structural analysis of food microstructure at scale.
Abstract
Influenced by the complexity of volumetric imaging, there is a shortage of established datasets useful for benchmarking volumetric deep-learning models. As a consequence, new and existing models are not easily comparable, limiting the development of architectures optimized specifically for volumetric data. To counteract this trend, we introduce MozzaVID - a large, clean, and versatile volumetric classification dataset. Our dataset contains X-ray computed tomography (CT) images of mozzarella microstructure and enables the classification of 25 cheese types and 149 cheese samples. We provide data in three different resolutions, resulting in three dataset instances containing from 591 to 37,824 images. While being general-purpose, the dataset also facilitates investigating mozzarella structure properties. The structure of food directly affects its functional properties and thus its consumption experience. Understanding food structure helps tune the production and mimicking it enables sustainable alternatives to animal-derived food products. The complex and disordered nature of food structures brings a unique challenge, where a choice of appropriate imaging method, scale, and sample size is not trivial. With this dataset we aim to address these complexities, contributing to more robust structural analysis models. The dataset can be downloaded from: https://archive.compute.dtu.dk/files/public/projects/MozzaVID/.
