Identification in source apportionment using geometry
Bora Jin, Abhirup Datta
TL;DR
This work addresses identifiability in source apportionment modeled by $Y=WH$ by defining a population-level source attribution percentage matrix $\Phi$ that is scale-invariant and identifiable under weak probabilistic separability with stationary ergodic emissions. It develops a geometric, convex-hull based estimator: as the sample size grows, the convex hull of the row-normalized data $Y^*$ converges to the hull of the true factor rows $\mathcal{H}^*$, enabling consistent recovery of $H^*$ via a maximum-volume $K$-vertex approach. A consistent estimator of the factor means $\widetilde{\mu}$ is constructed and used to obtain $\widehat{\Phi}$ up to permutation of sources, without requiring sparsity or fixed scaling. Numerical experiments show that the proposed estimator converges to the truth as $n$ increases, for both stationary ergodic and iid emission processes, underscoring the practical feasibility of geometric identifiability for policy-relevant attribution tasks.
Abstract
Source apportionment analysis, which aims to quantify the attribution of observed concentrations of multiple air pollutants to specific sources, can be formulated as a non-negative matrix factorization (NMF) problem. However, NMF is non-unique and typically relies on unverifiable assumptions such as sparsity and uninterpretable scalings. In this manuscript, we establish identifiability of the source attribution percentage matrix under much weaker and more realistic conditions. We introduce the population-level estimand for this matrix, and show that it is scale-invariant and identifiable even when the NMF factors are not. Viewing the data as a point cloud in a conical hull, we show that a geometric estimator of the source attribution percentage matrix is consistent without any sparsity or parametric distributional assumptions, and while accommodating spatio-temporal dependence. Numerical experiments corroborate the theory.
