Table of Contents
Fetching ...

A consensus-constrained parsimonious Gaussian mixture model for clustering hyperspectral images

Ganesh Babu, Aoife Gowen, Michael Fop, Isobel Claire Gormley

TL;DR

This work introduces ccPGMM, a consensus-constrained parsimonious Gaussian mixture model, to jointly cluster and label pixels across multiple hyperspectral food images while incorporating partial constraints from known regions. It fuses constrained-PGMM inference with a divide-and-conquer consensus scheme that operates on variable subsets, yielding a scalable approach for high-dimensional spectra where later-stage clustering is performed on a consolidated similarity matrix. The method demonstrates competitive clustering performance with substantially reduced computation compared to full PGMM/GMM and thresholding, and provides per-pixel clustering uncertainty. The framework is validated on simulated data and real puffed cereal images, and extended to classify a multi-grain image by aggregating subset posteriors, offering a practical tool for robust hyperspectral image labeling with uncertainty quantification.

Abstract

The use of hyperspectral imaging to investigate food samples has grown due to the improved performance and lower cost of instrumentation. Food engineers use hyperspectral images to classify the type and quality of a food sample, typically using classification methods. In order to train these methods, every pixel in each training image needs to be labelled. Typically, computationally cheap threshold-based approaches are used to label the pixels, and classification methods are trained based on those labels. However, threshold-based approaches are subjective and cannot be generalized across hyperspectral images taken in different conditions and of different foods. Here a consensus-constrained parsimonious Gaussian mixture model (ccPGMM) is proposed to label pixels in hyperspectral images using a model-based clustering approach. The ccPGMM utilizes information that is available on some pixels and specifies constraints on those pixels belonging to the same or different clusters while clustering the rest of the pixels in the image. A latent variable model is used to represent the high-dimensional data in terms of a small number of underlying latent factors. To ensure computational feasibility, a consensus clustering approach is employed, where the data are divided into multiple randomly selected subsets of variables and constrained clustering is applied to each data subset; the clustering results are then consolidated across all data subsets to provide a consensus clustering solution. The ccPGMM approach is applied to simulated datasets and real hyperspectral images of three types of puffed cereal, corn, rice, and wheat. Improved clustering performance and computational efficiency are demonstrated when compared to other current state-of-the-art approaches.

A consensus-constrained parsimonious Gaussian mixture model for clustering hyperspectral images

TL;DR

This work introduces ccPGMM, a consensus-constrained parsimonious Gaussian mixture model, to jointly cluster and label pixels across multiple hyperspectral food images while incorporating partial constraints from known regions. It fuses constrained-PGMM inference with a divide-and-conquer consensus scheme that operates on variable subsets, yielding a scalable approach for high-dimensional spectra where later-stage clustering is performed on a consolidated similarity matrix. The method demonstrates competitive clustering performance with substantially reduced computation compared to full PGMM/GMM and thresholding, and provides per-pixel clustering uncertainty. The framework is validated on simulated data and real puffed cereal images, and extended to classify a multi-grain image by aggregating subset posteriors, offering a practical tool for robust hyperspectral image labeling with uncertainty quantification.

Abstract

The use of hyperspectral imaging to investigate food samples has grown due to the improved performance and lower cost of instrumentation. Food engineers use hyperspectral images to classify the type and quality of a food sample, typically using classification methods. In order to train these methods, every pixel in each training image needs to be labelled. Typically, computationally cheap threshold-based approaches are used to label the pixels, and classification methods are trained based on those labels. However, threshold-based approaches are subjective and cannot be generalized across hyperspectral images taken in different conditions and of different foods. Here a consensus-constrained parsimonious Gaussian mixture model (ccPGMM) is proposed to label pixels in hyperspectral images using a model-based clustering approach. The ccPGMM utilizes information that is available on some pixels and specifies constraints on those pixels belonging to the same or different clusters while clustering the rest of the pixels in the image. A latent variable model is used to represent the high-dimensional data in terms of a small number of underlying latent factors. To ensure computational feasibility, a consensus clustering approach is employed, where the data are divided into multiple randomly selected subsets of variables and constrained clustering is applied to each data subset; the clustering results are then consolidated across all data subsets to provide a consensus clustering solution. The ccPGMM approach is applied to simulated datasets and real hyperspectral images of three types of puffed cereal, corn, rice, and wheat. Improved clustering performance and computational efficiency are demonstrated when compared to other current state-of-the-art approaches.
Paper Structure (27 sections, 29 equations, 21 figures, 2 tables)

This paper contains 27 sections, 29 equations, 21 figures, 2 tables.

Figures (21)

  • Figure 1: Greyscale images of one hyperspectral image of each puffed cereal type. For each pixel, the intensity of the color corresponds to the average of the reflectance information captured in the NIR spectrum.
  • Figure 2: An illustration of positive ($+$) and negative ($-$) constraints. Four blocks of three types of pixels are highlighted. Pixels in the blue blocks must be clustered together, as must the pixels in the yellow block and the pixels in the green block. However, pixels in the blue blocks must not be clustered with those in the yellow block or with those in the green block, and vice versa.
  • Figure 3: Greyscale image of three simulated hyperspectral images for one of the five well-separated cluster datasets. The pixels in the blue blocks must be clustered together, as must the pixels in the yellow, green, and red blocks. The pixels in one coloured block must not be clustered with the pixels in other coloured blocks.
  • Figure 4: For each of the five simulated datasets with low overlap clusters, the ARI between the known labels and the clustering solutions of DBSCAN, GMM, and PGMM fitted on $p = 101$ variables (first three panels from the left) and cPGMM and ccPGMM (last two panels on the right) fitted with different settings of $M$ and $d$.
  • Figure 5: For each of the five simulated datasets with low overlap clusters, the time taken to fit DBSCAN, GMM, and PGMM on $p = 101$ variables (first three panels from the left) and cPGMM and ccPGMM (last two panels on the right) fitted with different settings of $M$ and $d$.
  • ...and 16 more figures