Table of Contents
Fetching ...

Efficient Pareto Manifold Learning with Low-Rank Structure

Weiyu Chen, James T. Kwok

TL;DR

This work tackles the challenge of producing a scalable, continuous Pareto front for multi-task learning with many tasks. It introduces LORPMAN, which decomposes per-layer parameters into a shared main network plus multiple low-rank matrices, enabling efficient parameter sharing and task-specific adaptation, reinforced by orthogonal regularization. The approach is theoretically supported by a universal approximation-like theorem and empirically validated across datasets with varying task counts, showing improved hypervolume and parameter efficiency over state-of-the-art baselines. The method demonstrates strong performance gains, especially as the number of tasks grows, indicating practical benefits for large-scale multi-objective learning. The findings highlight LORPMAN as a flexible, efficient tool for continuous PF learning in complex, real-world MTL settings.

Abstract

Multi-task learning, which optimizes performance across multiple tasks, is inherently a multi-objective optimization problem. Various algorithms are developed to provide discrete trade-off solutions on the Pareto front. Recently, continuous Pareto front approximations using a linear combination of base networks have emerged as a compelling strategy. However, it suffers from scalability issues when the number of tasks is large. To address this issue, we propose a novel approach that integrates a main network with several low-rank matrices to efficiently learn the Pareto manifold. It significantly reduces the number of parameters and facilitates the extraction of shared features. We also introduce orthogonal regularization to further bolster performance. Extensive experimental results demonstrate that the proposed approach outperforms state-of-the-art baselines, especially on datasets with a large number of tasks.

Efficient Pareto Manifold Learning with Low-Rank Structure

TL;DR

This work tackles the challenge of producing a scalable, continuous Pareto front for multi-task learning with many tasks. It introduces LORPMAN, which decomposes per-layer parameters into a shared main network plus multiple low-rank matrices, enabling efficient parameter sharing and task-specific adaptation, reinforced by orthogonal regularization. The approach is theoretically supported by a universal approximation-like theorem and empirically validated across datasets with varying task counts, showing improved hypervolume and parameter efficiency over state-of-the-art baselines. The method demonstrates strong performance gains, especially as the number of tasks grows, indicating practical benefits for large-scale multi-objective learning. The findings highlight LORPMAN as a flexible, efficient tool for continuous PF learning in complex, real-world MTL settings.

Abstract

Multi-task learning, which optimizes performance across multiple tasks, is inherently a multi-objective optimization problem. Various algorithms are developed to provide discrete trade-off solutions on the Pareto front. Recently, continuous Pareto front approximations using a linear combination of base networks have emerged as a compelling strategy. However, it suffers from scalability issues when the number of tasks is large. To address this issue, we propose a novel approach that integrates a main network with several low-rank matrices to efficiently learn the Pareto manifold. It significantly reduces the number of parameters and facilitates the extraction of shared features. We also introduce orthogonal regularization to further bolster performance. Extensive experimental results demonstrate that the proposed approach outperforms state-of-the-art baselines, especially on datasets with a large number of tasks.
Paper Structure (32 sections, 4 theorems, 22 equations, 9 figures, 9 tables, 1 algorithm)

This paper contains 32 sections, 4 theorems, 22 equations, 9 figures, 9 tables, 1 algorithm.

Key Result

Theorem 3.1

Assume that $X \times \Delta^m$ is compact and $t(\boldsymbol{x}, \boldsymbol{\alpha})$ is continuous. For any $\epsilon > 0$, there exists a ReLU MLP $h$ with main network $\boldsymbol{\theta}_0$ and $m$ low-rank matrices $\boldsymbol{B}_1\boldsymbol{A}_1, \ldots, \boldsymbol{B}_m\boldsymbol{A}_m$,

Figures (9)

  • Figure 1: Layer-wise similarities between base networks obtained by PaMaL on MultiMNIST over three random seeds. Shaded areas represent the 95% confidence interval.
  • Figure 2: Illustration of the proposed LORPMAN on a $L$-layer base network with $m$ tasks. For each layer, we aim to learn $m$ low-rank matrices which are orthogonal to each other.
  • Figure 3: Trajectory of $\boldsymbol{\theta}$ obtained by LORPMAN with $\boldsymbol{\alpha} = [0.5, 0.5]$ (red), $\boldsymbol{\alpha} = [1, 0]$ (green), and $\boldsymbol{\alpha} = [0, 1]$ (blue) in objective space (a) and parameter space (b). Circles denote the initial points and squares denote the final points. The gray lines in (a) and (b) denote the PF and PS, respectively.
  • Figure 4: Test performance on MultiMNIST and Census. The PF is shown in bold. We show the results obtained by three different random seeds.
  • Figure 5: Test performance of PaMaL and LORPMAN on UTKFace. Figures (b), (c), (d) are 2D projections of (a) for better illustration of the 3D surface.
  • ...and 4 more figures

Theorems & Definitions (7)

  • Definition 2.1: Pareto Dominance and Pareto-Optimal moo
  • Theorem 3.1
  • Proposition 3.2: convex
  • Theorem 1.1
  • proof
  • Proposition 2.1: convex
  • proof