Table of Contents
Fetching ...

Predictive Low Rank Matrix Learning under Partial Observations: Mixed-Projection ADMM

Dimitris Bertsimas, Nicholas A. G. Johnson

TL;DR

This work addresses learning a partially observed matrix under a low-rank assumption when a fully observed side information matrix depends linearly on the true matrix. It introduces a mixed-projection reformulation that couples the matrix via a projection variable with a nuclear-norm regularization, and derives a strong SDP relaxation to obtain a convex surrogate. A scalable Mixed-Projection ADMM algorithm is developed, with closed-form subproblem solutions for U, V, P, and Z, and provable per-iteration complexity that scales favorably with problem size. Empirical results on synthetic and large-scale real data show that the proposed method achieves lower objective values and reconstruction errors in small-rank regimes (k ≤ 10) and significantly faster out-of-sample performance on Netflix-like data, often outperforming Fast-Impute variants and other baselines. The framework provides a practical and scalable approach for predictive low-rank matrix learning under partial observations with side information, and opens avenues for global optimality certificates via the SDP relaxation.

Abstract

We study the problem of learning a partially observed matrix under the low rank assumption in the presence of fully observed side information that depends linearly on the true underlying matrix. This problem consists of an important generalization of the Matrix Completion problem, a central problem in Statistics, Operations Research and Machine Learning, that arises in applications such as recommendation systems, signal processing, system identification and image denoising. We formalize this problem as an optimization problem with an objective that balances the strength of the fit of the reconstruction to the observed entries with the ability of the reconstruction to be predictive of the side information. We derive a mixed-projection reformulation of the resulting optimization problem and present a strong semidefinite cone relaxation. We design an efficient, scalable alternating direction method of multipliers algorithm that produces high quality feasible solutions to the problem of interest. Our numerical results demonstrate that in the small rank regime ({\color{black}$k \leq 10$}), our algorithm outputs solutions that achieve on average {\color{black}$2.3\%$} lower objective value and {\color{black}$41\%$} lower $\ell_2$ reconstruction error than the solutions returned by the best performing benchmark method on synthetic data. The runtime of our algorithm is competitive with and often superior to that of the benchmark methods. Our algorithm is able to solve problems with $n = 10000$ rows and $m = 10000$ columns in less than a minute. On large scale real world data, our algorithm produces solutions that achieve $67\%$ lower out of sample error than benchmark methods in $97\%$ less execution time.

Predictive Low Rank Matrix Learning under Partial Observations: Mixed-Projection ADMM

TL;DR

This work addresses learning a partially observed matrix under a low-rank assumption when a fully observed side information matrix depends linearly on the true matrix. It introduces a mixed-projection reformulation that couples the matrix via a projection variable with a nuclear-norm regularization, and derives a strong SDP relaxation to obtain a convex surrogate. A scalable Mixed-Projection ADMM algorithm is developed, with closed-form subproblem solutions for U, V, P, and Z, and provable per-iteration complexity that scales favorably with problem size. Empirical results on synthetic and large-scale real data show that the proposed method achieves lower objective values and reconstruction errors in small-rank regimes (k ≤ 10) and significantly faster out-of-sample performance on Netflix-like data, often outperforming Fast-Impute variants and other baselines. The framework provides a practical and scalable approach for predictive low-rank matrix learning under partial observations with side information, and opens avenues for global optimality certificates via the SDP relaxation.

Abstract

We study the problem of learning a partially observed matrix under the low rank assumption in the presence of fully observed side information that depends linearly on the true underlying matrix. This problem consists of an important generalization of the Matrix Completion problem, a central problem in Statistics, Operations Research and Machine Learning, that arises in applications such as recommendation systems, signal processing, system identification and image denoising. We formalize this problem as an optimization problem with an objective that balances the strength of the fit of the reconstruction to the observed entries with the ability of the reconstruction to be predictive of the side information. We derive a mixed-projection reformulation of the resulting optimization problem and present a strong semidefinite cone relaxation. We design an efficient, scalable alternating direction method of multipliers algorithm that produces high quality feasible solutions to the problem of interest. Our numerical results demonstrate that in the small rank regime ({\color{black}}), our algorithm outputs solutions that achieve on average {\color{black}} lower objective value and {\color{black}} lower reconstruction error than the solutions returned by the best performing benchmark method on synthetic data. The runtime of our algorithm is competitive with and often superior to that of the benchmark methods. Our algorithm is able to solve problems with rows and columns in less than a minute. On large scale real world data, our algorithm produces solutions that achieve lower out of sample error than benchmark methods in less execution time.
Paper Structure (49 sections, 13 theorems, 68 equations, 16 figures, 18 tables, 1 algorithm)

This paper contains 49 sections, 13 theorems, 68 equations, 16 figures, 18 tables, 1 algorithm.

Key Result

Proposition 1

Problem opt_lrml4:MC_primal is equivalent to the following robust optimization problem: where $\mathcal{U}=\{\bm{\Delta} \in \mathbb{R}^{n \times m}: \Vert \bm{\Delta} \Vert_\sigma \leq \gamma\}$.

Figures (16)

  • Figure 1: Algorithm \ref{['alg:ADMM']} primal and dual residual evolution versus iteration number for a single synthetic data run with $n=1000, m=100, k=5$ and $d=150$. Note that due to the logarithmic scale, the phi residual and psi residual lines are overlapping.
  • Figure 2: Objective value (top left), $\ell_2$ reconstruction error (top right), side information $R^2$ (bottom left) and execution time (bottom right) versus $n$ with $m=100, k=5$ and $d=150$. Averaged over $20$ trials for each parameter configuration.
  • Figure 3: Objective value (top left), $\ell_2$ reconstruction error (top right), side information $R^2$ (bottom left) and execution time (bottom right) versus $n$ with $m=100, k=5$ and $d=150$. Averaged over $20$ trials for each parameter configuration.
  • Figure 4: Cumulative time spent solving each subproblem of Algorithm \ref{['alg:ADMM']} versus $n$ with $m=100, k=5$ and $d=150$. Averaged over $20$ trials for each parameter configuration.
  • Figure 5: Objective value (top left), $\ell_2$ reconstruction error (top right), fitted rank (bottom left) and execution time (bottom right) versus $m$ with $n=1000, k=5$ and $d=150$. Averaged over $20$ trials for each parameter configuration.
  • ...and 11 more figures

Theorems & Definitions (25)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • Proposition 5
  • proof
  • ...and 15 more