Beyond Vintage Rotation: Bias-Free Sparse Representation Learning with Oracle Inference

Chengyu Cui; Yunxiao Chen; Jing Ouyang; Gongjun Xu

Beyond Vintage Rotation: Bias-Free Sparse Representation Learning with Oracle Inference

Chengyu Cui, Yunxiao Chen, Jing Ouyang, Gongjun Xu

TL;DR

This paper proposes a novel bias-free rotation method within a general representation learning framework based on latent variables, and establishes an oracle inference property for the learned sparse representations: the estimators achieve the same asymptotic variance as in the ideal setting where the latent variables are observed.

Abstract

Learning low-dimensional latent representations is a central topic in statistics and machine learning, and rotation methods have long been used to obtain sparse and interpretable representations. Despite nearly a century of widespread use across many fields, rigorous guarantees for valid inference for the learned representation remain lacking. In this paper, we identify a surprisingly prevalent phenomenon that suggests a reason for this gap: for a broad class of vintage rotations, the resulting estimators exhibit a non-estimable bias. Because this bias is independent of the data, it fundamentally precludes the development of valid inferential procedures, including the construction of confidence intervals and hypothesis testing. To address this challenge, we propose a novel bias-free rotation method within a general representation learning framework based on latent variables. We establish an oracle inference property for the learned sparse representations: the estimators achieve the same asymptotic variance as in the ideal setting where the latent variables are observed. To bridge the gap between theory and computation, we develop an efficient computational framework and prove that its output estimators retain the same oracle property. Our results provide a rigorous inference procedure for the rotated estimators, yielding statistically valid and interpretable representation learning.

Beyond Vintage Rotation: Bias-Free Sparse Representation Learning with Oracle Inference

TL;DR

Abstract

Paper Structure (15 sections, 5 theorems, 24 equations, 6 figures, 2 algorithms)

This paper contains 15 sections, 5 theorems, 24 equations, 6 figures, 2 algorithms.

Introduction
Limitations of Vintage Rotations
Our Contributions
Problem Setup
Sparse Representation
Bias in Existing Rotation Methods
Proposed Method
Oracle Inference Properties
Computational Guarantee
Initialisation
Local Quadratic Approximation
Empirical Studies
Simulation Studies
Real Data Analysis
Discussion

Key Result

Proposition 1

Consider any orthogonal/oblique rotation method with criterion $Q(\cdot)$. If $(i)$$Q(\cdot)$ is entry-wise differentiable, i.e., $\partial_{jl}Q(\bm{A}) := \partial_{a_{jl}}Q(\bm{A})$ exists for $j\in[q]$ and $l\in[r]$; and $(ii)$$\partial_{jl}Q(\bm{A})|_{a_{jl} = 0} = 0$ for any $j\in[q]$ and $l\i

Figures (6)

Figure 1: Histogram of estimates for the first row of $\bm{A}^*$, given as $(1,0,0,0,0)$, from different methods, with each component shown in a separate panel. The estimates for the remaining rows exhibit a similar pattern, as reported in Section \ref{['subsec_simu']}. Here, "varimax" denotes the varimax estimator; "ours" denotes the Folomin estimator with the MCP loss (see details in Section \ref{['sec_main']}); and "oracle" denotes the estimator computed with the latent variables observed. The full simulation study is given in Section \ref{['subsec_simu']}.
Figure 2: Bias of the estimation for $\bm{A}^*_{1:5,}$ under $n=2000$, $q=2000$, $\lambda = 0.1$ and different $\tau$. Each panel displays the distribution of $\widehat{\bm{A}}_{j,} - \bm{A}^*_{j,}$ for $j=1,2,\dots,5$, with methods indicated above (varimax/promax, Folomin with MCP, and the oracle benchmark).
Figure 3: Scaled mean squared errors of estimation for $\bm{A}^*$, defined entry-wise as $n\times \sum_{t=1}^{200}(\widehat{\bm{A}}_{j,l}^{(t)} - \bm{A}_{j,l}^*)^2/200$ over 200 replications, where $\widehat{\bm{A}}^{(t)}$ is the estimator produced by different methods at replication $t$, across different settings of $n$, $q$, $\lambda$, and $\tau$.
Figure 4: Empirical coverage across 200 replications for $\bm{A}^*$ under different settings of $n$, $q$, $\lambda$ and $\tau$.
Figure 5: Heatmap for the (transposed) estimated representation matrix for the IPIP-NEO dataset. Each row reflects the dependence of 300 items on a certain personality trait. The columns are grouped and labelled by the item pools in each personality domain.
...and 1 more figures

Theorems & Definitions (20)

Example 1
Example 2
Example 3
Remark 1
Definition 1
Remark 2
Definition 2
Remark 3
Proposition 1
Example 4
...and 10 more

Beyond Vintage Rotation: Bias-Free Sparse Representation Learning with Oracle Inference

TL;DR

Abstract

Beyond Vintage Rotation: Bias-Free Sparse Representation Learning with Oracle Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (20)