Table of Contents
Fetching ...

Compositional regression using principal nested spheres

Mymuna Monem, Ian L. Dryden, Florence George, Natalia Soares Quinete

Abstract

Regression with compositional responses is challenging due to the nonlinear geometry of the simplex and the limitations of Euclidean methods. We propose a regression framework for manifold-valued data based on mappings to statistically tractable intermediate spaces. For compositional data, responses are embedded in the positive orthant of the sphere and analysed using Principal Nested Spheres (PNS), yielding a cylindrical intermediate space with a circular leading score and Euclidean higher-order scores. Regression is performed in this intermediate space and fitted values are mapped back to the simplex. A simulation study demonstrates good performance of PNS-based regression. An application to environmental chemical exposure data illustrates the interpretability and practical utility of the method.

Compositional regression using principal nested spheres

Abstract

Regression with compositional responses is challenging due to the nonlinear geometry of the simplex and the limitations of Euclidean methods. We propose a regression framework for manifold-valued data based on mappings to statistically tractable intermediate spaces. For compositional data, responses are embedded in the positive orthant of the sphere and analysed using Principal Nested Spheres (PNS), yielding a cylindrical intermediate space with a circular leading score and Euclidean higher-order scores. Regression is performed in this intermediate space and fitted values are mapped back to the simplex. A simulation study demonstrates good performance of PNS-based regression. An application to environmental chemical exposure data illustrates the interpretability and practical utility of the method.
Paper Structure (14 sections, 8 equations, 8 figures, 1 table)

This paper contains 14 sections, 8 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: The great circle fit (left, in green) and the small circle fit (right) for the 3D geochemical data using power $\alpha=1/2$. The green solid line is the fitted subsphere and each point is projected onto the subsphere (in white). Also, shown in yellow is the PNS mean.
  • Figure 2: A ternary diagram with fitted PNS subspheres. The first principal component from the compositions package is shown in a black solid line, and corresponds to $\alpha \to 0$. The great circle fits for $\alpha \in \{ 0.25, 0.5, 1\}$ are given by green, red and blue dashed lines, and the small circle fits for $\alpha \in \{ 0.25, 0.5, 1\}$ are given by purple, cyan and gold lines. The great and small circle PNS means are given by the respective coloured symbols.
  • Figure 3: Simulated compositional data with five components. The 'PNS score 1' and 'PNS all' methods provide particularly close fits, with the circular package the next best, and the remaining methods are all rather poor.
  • Figure 4: The number of chemicals detected in total over the matrices they represent (water, dust, food and soil) in the nine grouped geographical areas in the study. The distribution of the number of detected chemicals is displayed as a violin plot for each of nine areas with the mean in red, an underlying boxplot and jittered observed values.
  • Figure 5: A map of the zip codes in Region A and Region B. Region A locations are mainly along the coast or near to Biscayne Bay
  • ...and 3 more figures