Table of Contents
Fetching ...

MozzaVID: Mozzarella Volumetric Image Dataset

Pawel Tomasz Pieta, Peter Winkel Rasmussen, Anders Bjorholm Dahl, Jeppe Revall Frisvad, Siavash Arjomand Bigdeli, Carsten Gundlach, Anders Nymark Christensen

TL;DR

MozzaVID addresses the scarcity and comparability of volumetric benchmarks by introducing a large, well-curated X-ray CT dataset of mozzarella microstructure for classifying 25 cheese types and 149 samples across three downsampled resolutions. The authors benchmark multiple 3D CNN and transformer architectures, compare 2D slices versus full 3D volumes, and perform ablations on granularity and dimensionality, complemented by learned-representation analyses (UMAP) linked to metadata. They demonstrate that 3D representations yield higher accuracy, with the Large configuration achieving near-perfect coarse-grained ($0.993$) and strong fine-grained ($0.935$) performance, while revealing meaningful latent structure that reflects cheese recipes and processing. MozzaVID thus provides a scalable, domain-specific benchmark for volumetric DL, highlighting the need for models tailored to 3D data and enabling structural analysis of food microstructure at scale.

Abstract

Influenced by the complexity of volumetric imaging, there is a shortage of established datasets useful for benchmarking volumetric deep-learning models. As a consequence, new and existing models are not easily comparable, limiting the development of architectures optimized specifically for volumetric data. To counteract this trend, we introduce MozzaVID - a large, clean, and versatile volumetric classification dataset. Our dataset contains X-ray computed tomography (CT) images of mozzarella microstructure and enables the classification of 25 cheese types and 149 cheese samples. We provide data in three different resolutions, resulting in three dataset instances containing from 591 to 37,824 images. While being general-purpose, the dataset also facilitates investigating mozzarella structure properties. The structure of food directly affects its functional properties and thus its consumption experience. Understanding food structure helps tune the production and mimicking it enables sustainable alternatives to animal-derived food products. The complex and disordered nature of food structures brings a unique challenge, where a choice of appropriate imaging method, scale, and sample size is not trivial. With this dataset we aim to address these complexities, contributing to more robust structural analysis models. The dataset can be downloaded from: https://archive.compute.dtu.dk/files/public/projects/MozzaVID/.

MozzaVID: Mozzarella Volumetric Image Dataset

TL;DR

MozzaVID addresses the scarcity and comparability of volumetric benchmarks by introducing a large, well-curated X-ray CT dataset of mozzarella microstructure for classifying 25 cheese types and 149 samples across three downsampled resolutions. The authors benchmark multiple 3D CNN and transformer architectures, compare 2D slices versus full 3D volumes, and perform ablations on granularity and dimensionality, complemented by learned-representation analyses (UMAP) linked to metadata. They demonstrate that 3D representations yield higher accuracy, with the Large configuration achieving near-perfect coarse-grained () and strong fine-grained () performance, while revealing meaningful latent structure that reflects cheese recipes and processing. MozzaVID thus provides a scalable, domain-specific benchmark for volumetric DL, highlighting the need for models tailored to 3D data and enabling structural analysis of food microstructure at scale.

Abstract

Influenced by the complexity of volumetric imaging, there is a shortage of established datasets useful for benchmarking volumetric deep-learning models. As a consequence, new and existing models are not easily comparable, limiting the development of architectures optimized specifically for volumetric data. To counteract this trend, we introduce MozzaVID - a large, clean, and versatile volumetric classification dataset. Our dataset contains X-ray computed tomography (CT) images of mozzarella microstructure and enables the classification of 25 cheese types and 149 cheese samples. We provide data in three different resolutions, resulting in three dataset instances containing from 591 to 37,824 images. While being general-purpose, the dataset also facilitates investigating mozzarella structure properties. The structure of food directly affects its functional properties and thus its consumption experience. Understanding food structure helps tune the production and mimicking it enables sustainable alternatives to animal-derived food products. The complex and disordered nature of food structures brings a unique challenge, where a choice of appropriate imaging method, scale, and sample size is not trivial. With this dataset we aim to address these complexities, contributing to more robust structural analysis models. The dataset can be downloaded from: https://archive.compute.dtu.dk/files/public/projects/MozzaVID/.

Paper Structure

This paper contains 28 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Comparison of typical volumetric and 2D dataset sizes. The three instances of the proposed MozzaVID dataset form a bridge between the two groups while maintaining the volume sizes known from other volumetric datasets. A complete overview of the visualized datasets can be found in \ref{['tab:datasets', 'tab:datasets_2D']}.
  • Figure 2: Mozzarella samples wrapped in parafilm and mounted for scanning (left). Structure variability demonstrated by 2D slices from four different cheese types (right). Light areas represent the protein matrix, while the dark areas are the fat globules/domains.
  • Figure 3: Sketch of the three proposed dataset configurations. The raw volume is downscaled and split, ensuring that in each case, the final volumes have the shape of 192 cubed.
  • Figure 4: UMAP generated from second-to-last layer feature representations of the best-performing model in the coarse-grained classification task (ResNet50 trained on the Large dataset). Reduction parameters: n_neighbors=15, min_dist=0.5.
  • Figure 5: Overview of the variation in the normalized experimental design parameters in the first 24 cheese types (coarse-grained classes). Data for class 25 (Cagliata) is not available. Underlines highlight three pairs of cheese produced with the same set of parameters.
  • ...and 4 more figures