Identification in source apportionment using geometry

Bora Jin; Abhirup Datta

Identification in source apportionment using geometry

Bora Jin, Abhirup Datta

TL;DR

This work addresses identifiability in source apportionment modeled by $Y=WH$ by defining a population-level source attribution percentage matrix $\Phi$ that is scale-invariant and identifiable under weak probabilistic separability with stationary ergodic emissions. It develops a geometric, convex-hull based estimator: as the sample size grows, the convex hull of the row-normalized data $Y^*$ converges to the hull of the true factor rows $\mathcal{H}^*$, enabling consistent recovery of $H^*$ via a maximum-volume $K$-vertex approach. A consistent estimator of the factor means $\widetilde{\mu}$ is constructed and used to obtain $\widehat{\Phi}$ up to permutation of sources, without requiring sparsity or fixed scaling. Numerical experiments show that the proposed estimator converges to the truth as $n$ increases, for both stationary ergodic and iid emission processes, underscoring the practical feasibility of geometric identifiability for policy-relevant attribution tasks.

Abstract

Source apportionment analysis, which aims to quantify the attribution of observed concentrations of multiple air pollutants to specific sources, can be formulated as a non-negative matrix factorization (NMF) problem. However, NMF is non-unique and typically relies on unverifiable assumptions such as sparsity and uninterpretable scalings. In this manuscript, we establish identifiability of the source attribution percentage matrix under much weaker and more realistic conditions. We introduce the population-level estimand for this matrix, and show that it is scale-invariant and identifiable even when the NMF factors are not. Viewing the data as a point cloud in a conical hull, we show that a geometric estimator of the source attribution percentage matrix is consistent without any sparsity or parametric distributional assumptions, and while accommodating spatio-temporal dependence. Numerical experiments corroborate the theory.

Identification in source apportionment using geometry

TL;DR

This work addresses identifiability in source apportionment modeled by

by defining a population-level source attribution percentage matrix

that is scale-invariant and identifiable under weak probabilistic separability with stationary ergodic emissions. It develops a geometric, convex-hull based estimator: as the sample size grows, the convex hull of the row-normalized data

converges to the hull of the true factor rows

, enabling consistent recovery of

via a maximum-volume

-vertex approach. A consistent estimator of the factor means

is constructed and used to obtain

up to permutation of sources, without requiring sparsity or fixed scaling. Numerical experiments show that the proposed estimator converges to the truth as

increases, for both stationary ergodic and iid emission processes, underscoring the practical feasibility of geometric identifiability for policy-relevant attribution tasks.

Identification in source apportionment using geometry

TL;DR

Abstract

Identification in source apportionment using geometry

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (11)