Table of Contents
Fetching ...

Partitioned Least Squares

Roberto Esposito, Mattia Cerrato, Marco Locatelli

TL;DR

This work introduces Partitioned Least Squares (Parti-tio-ned-LS), a non-convex, group-structured regression framework that partitions features into interpretable groups with nonnegative, unit-sum within-group allocations. It presents two solving strategies: an alternating convex-search based approach (PartLS-alt) and an exact reformulation (PartLS-opt) that reduces the problem to an exponential number of convex subproblems, with a branch-and-bound variant for larger partitions. Theoretical results establish non-convexity and NP-completeness, and the two algorithms are shown to trade off accuracy and scalability, with PartLS-opt typically delivering superior accuracy and interpretability when a meaningful partition is available. Empirically, the method yields interpretable group-level insights (e.g., Ames House Prices) and competitive generalization when the partition aligns with the data distribution, while highlighting challenges with collinearity and the need for interactive partition design in practice.

Abstract

In this paper we propose a variant of the linear least squares model allowing practitioners to partition the input features into groups of variables that they require to contribute similarly to the final result. The output allows practitioners to assess the importance of each group and of each variable in the group. We formally show that the new formulation is not convex and provide two alternative methods to deal with the problem: one non-exact method based on an alternating least squares approach; and one exact method based on a reformulation of the problem using an exponential number of sub-problems whose minimum is guaranteed to be the optimal solution. We formally show the correctness of the exact method and also compare the two solutions showing that the exact solution provides better results in a fraction of the time required by the alternating least squares solution (assuming that the number of partitions is small). For the sake of completeness, we also provide an alternative branch and bound algorithm that can be used in place of the exact method when the number of partitions is too large, and a proof of NP-completeness of the optimization problem introduced in this paper.

Partitioned Least Squares

TL;DR

This work introduces Partitioned Least Squares (Parti-tio-ned-LS), a non-convex, group-structured regression framework that partitions features into interpretable groups with nonnegative, unit-sum within-group allocations. It presents two solving strategies: an alternating convex-search based approach (PartLS-alt) and an exact reformulation (PartLS-opt) that reduces the problem to an exponential number of convex subproblems, with a branch-and-bound variant for larger partitions. Theoretical results establish non-convexity and NP-completeness, and the two algorithms are shown to trade off accuracy and scalability, with PartLS-opt typically delivering superior accuracy and interpretability when a meaningful partition is available. Empirically, the method yields interpretable group-level insights (e.g., Ames House Prices) and competitive generalization when the partition aligns with the data distribution, while highlighting challenges with collinearity and the need for interactive partition design in practice.

Abstract

In this paper we propose a variant of the linear least squares model allowing practitioners to partition the input features into groups of variables that they require to contribute similarly to the final result. The output allows practitioners to assess the importance of each group and of each variable in the group. We formally show that the new formulation is not convex and provide two alternative methods to deal with the problem: one non-exact method based on an alternating least squares approach; and one exact method based on a reformulation of the problem using an exponential number of sub-problems whose minimum is guaranteed to be the optimal solution. We formally show the correctness of the exact method and also compare the two solutions showing that the exact solution provides better results in a fraction of the time required by the alternating least squares solution (assuming that the number of partitions is small). For the sake of completeness, we also provide an alternative branch and bound algorithm that can be used in place of the exact method when the number of partitions is too large, and a proof of NP-completeness of the optimization problem introduced in this paper.

Paper Structure

This paper contains 13 sections, 4 theorems, 36 equations, 4 figures, 5 tables.

Key Result

Theorem 1

The Par-ti-tio-ned-LS problem is not convex.

Figures (4)

  • Figure 1: Plot of the behavior of the two proposed algorithms on four datasets. PartLS-alt has been repeated 100 times following a multi-start strategy and in two settings ($T$=20 and $T$=100). Each point on the orange and blue lines reports the cumulative time and best objective found during these 100 restarts. PartLS-opt outputs a single solution, drawn in green.
  • Figure 2: Feature groups and associated $\beta$ values as found by PartLS-opt on the Ames House Prices dataset kaggle.
  • Figure 3: $\alpha$ values for group "Outside Facilities" found by PartLS-opt on the Ames House Prices dataset kaggle.
  • Figure 4: Feature weights for group "Outside Facilities" of a regularized linear regression on the Ames House Prices dataset kaggle.

Theorems & Definitions (13)

  • Definition 1
  • Theorem 1
  • proof
  • Remark 1
  • Theorem 2
  • proof
  • Definition 2
  • Theorem 3
  • proof
  • Remark 2
  • ...and 3 more