Table of Contents
Fetching ...

Constrained non-negative matrix factorization enabling real-time insights of $\textit{in situ}$ and high-throughput experiments

Phillip M. Maffettone, Aidan C. Daly, Daniel Olds

TL;DR

The paper addresses the challenge of real-time interpretation of streaming diffraction data where canonical NMF can yield nonphysical components. It introduces constrained non-negative matrix factorization with user or algorithmic priors, solved via alternating non-negative least squares and implemented in PyTorch, to produce physically meaningful weights and components during in situ analyses. Demonstrations on synthetic datasets and on variable-temperature $\$BaTiO_3$ and molten-salt $NaCl:CrCl_3$ data show that constraints yield interpretable phase evolution and enable adaptive experimental decisions. This approach enables rapid, human-in-the-loop insights at beamlines and is extensible to other spectral modalities, providing a practical path toward real-time discovery in high-throughput experiments.

Abstract

Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method for real-time analysis of streaming spectral data in time-sensitive data collection, such as $\textit{in situ}$ characterization of materials. However, canonical NMF methods are optimized to reconstruct a full dataset as closely as possible, with no underlying requirement that the reconstruction produces components or weights representative of the true physical processes. In this work, we demonstrate how constraining NMF weights or components, provided as known or assumed priors, can provide significant improvement in revealing true underlying phenomena. We present a PyTorch based method for efficiently applying constrained NMF and demonstrate this on several synthetic examples. When applied to streaming experimentally measured spectral data, an expert researcher-in-the-loop can provide and dynamically adjust the constraints. This set of interactive priors to the NMF model can, for example, contain known or identified independent components, as well as functional expectations about the mixing of components. We demonstrate this application on measured X-ray diffraction and pair distribution function data from $\textit{in situ}$ beamline experiments. Details of the method are described, and general guidance provided to employ constrained NMF in extraction of critical information and insights during $\textit{in situ}$ and high-throughput experiments.

Constrained non-negative matrix factorization enabling real-time insights of $\textit{in situ}$ and high-throughput experiments

TL;DR

The paper addresses the challenge of real-time interpretation of streaming diffraction data where canonical NMF can yield nonphysical components. It introduces constrained non-negative matrix factorization with user or algorithmic priors, solved via alternating non-negative least squares and implemented in PyTorch, to produce physically meaningful weights and components during in situ analyses. Demonstrations on synthetic datasets and on variable-temperature BaTiO_3NaCl:CrCl_3$ data show that constraints yield interpretable phase evolution and enable adaptive experimental decisions. This approach enables rapid, human-in-the-loop insights at beamlines and is extensible to other spectral modalities, providing a practical path toward real-time discovery in high-throughput experiments.

Abstract

Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method for real-time analysis of streaming spectral data in time-sensitive data collection, such as characterization of materials. However, canonical NMF methods are optimized to reconstruct a full dataset as closely as possible, with no underlying requirement that the reconstruction produces components or weights representative of the true physical processes. In this work, we demonstrate how constraining NMF weights or components, provided as known or assumed priors, can provide significant improvement in revealing true underlying phenomena. We present a PyTorch based method for efficiently applying constrained NMF and demonstrate this on several synthetic examples. When applied to streaming experimentally measured spectral data, an expert researcher-in-the-loop can provide and dynamically adjust the constraints. This set of interactive priors to the NMF model can, for example, contain known or identified independent components, as well as functional expectations about the mixing of components. We demonstrate this application on measured X-ray diffraction and pair distribution function data from beamline experiments. Details of the method are described, and general guidance provided to employ constrained NMF in extraction of critical information and insights during and high-throughput experiments.

Paper Structure

This paper contains 12 sections, 6 equations, 11 figures, 1 algorithm.

Figures (11)

  • Figure 1: Example datasets used to demonstrate the capabilities of canonical and constrained NMF. (a) Two noisy Gaussian curves centered at 2.5 and -2.5 are mixed in linear combinations with coefficients that vary approximately, but not exactly, linearly across the dataset. (b) A Lorentzian centered at 3.0, a box function of width two centered at 5.0, and a Gaussian centered at 6.0 are mixed as linear combination with functionally varying weights across the dataset.
  • Figure 2: Constrained NMF was used to analyze two variable temperature diffraction datasets. (a) BaTiO$_3$ undergoes three phase transitions across the temperature range 150--450 K, that are imperceptible even to traditional refinement techniques. (b) NaCl:CrCl$_3$ diffraction was measured a temperature range of 300--963 K that included a melting transition and an anomolous high temperature solid phase (shown by the remaining sharp peaks above 650 K).
  • Figure 3: The reconstruction of the (a) first, (b) median, and (c) last pattern of a dataset of mixed Gaussian functions using canonical NMF. The full dataset is shown in Figure \ref{['fig:datasets']}(a). (d) Both the ordering and amplitude of the learned components fails to match the ground truth functions. (e) The learned weights compensate for this false magnitude and fit accurately to the non-linearity in the true weights.
  • Figure 4: The reconstruction of the (a) first, (b) median, and (c) last pattern of a dataset of mixed Gaussian functions using constrained NMF. The weights of the decomposition were constrained to be linear with respect to the dataset index. The full dataset is shown in Figure \ref{['fig:datasets']}(a). (d) Both the ordering and amplitude of the learned components successfully matches the ground truth functions. (e) By fixing the weights, they cannot match the non-linearity in the ground truth but serve as a physically meaningful approximation.
  • Figure 5: The reconstruction of the (a) first, (b) median, and (c) last pattern of a dataset of mixed Gaussian, Lorenzian, and box functions using canonical NMF. The full dataset is shown in Figure \ref{['fig:datasets']}(b). (d) Canonical NMF fails to successfully decompose this dataset and produce an accurate reconstruction. The learned components are mixtures of the underlying functions. (e) The learned weights compensate for the inaccurate components, albeit the decomposition is trapped in a local optimum.
  • ...and 6 more figures