Table of Contents
Fetching ...

Sharper Risk Bound for Multi-Task Learning with Multi-Graph Dependent Data

Xiao Shao, Guoqiang Wu

TL;DR

This work tackles sharp generalization in multi-task learning when per-task data exhibit multi-graph dependencies, where prior results yield only $O\left(\frac{1}{\sqrt{n}}\right)$ bounds. It introduces a Bennett-type concentration inequality for multi-graph dependent variables and couples it with a new Talagrand-type inequality for empirical processes, forming a local fractional Rademacher complexity (LFRC) framework that yields fast $O\left(\frac{\log n}{n}\right)$ risk bounds. The authors instantiate these tools into the MTL-MGD setting, deriving base and excess risk bounds that depend on graph complexity via the fractional chromatic number $\chi_f(G_k)$ and show that, under favorable kernel eigenvalue decay, the fixed point $r^*$ leads to fast rates. They apply the theory to Macro-AUC optimization, transforming multi-label data into multi-task instances, and demonstrate tighter bounds and competitive empirical performance on standard benchmarks. The results advance generalization analysis for dependent data and provide practical guidance for risk control in multi-task systems with graph-structured dependencies.

Abstract

In multi-task learning (MTL) with each task involving graph-dependent data, existing generalization analyses yield a \emph{sub-optimal} risk bound of $O(\frac{1}{\sqrt{n}})$, where $n$ is the number of training samples of each task. However, to improve the risk bound is technically challenging, which is attributed to the lack of a foundational sharper concentration inequality for multi-graph dependent random variables. To fill up this gap, this paper proposes a new Bennett-type inequality, enabling the derivation of a sharper risk bound of $O(\frac{\log n}{n})$. Technically, building on the proposed Bennett-type inequality, we propose a new Talagrand-type inequality for the empirical process, and further develop a new analytical framework of the local fractional Rademacher complexity to enhance generalization analyses in MTL with multi-graph dependent data. Finally, we apply the theoretical advancements to applications such as Macro-AUC optimization, illustrating the superiority of our theoretical results over prior work, which is also verified by experimental results.

Sharper Risk Bound for Multi-Task Learning with Multi-Graph Dependent Data

TL;DR

This work tackles sharp generalization in multi-task learning when per-task data exhibit multi-graph dependencies, where prior results yield only bounds. It introduces a Bennett-type concentration inequality for multi-graph dependent variables and couples it with a new Talagrand-type inequality for empirical processes, forming a local fractional Rademacher complexity (LFRC) framework that yields fast risk bounds. The authors instantiate these tools into the MTL-MGD setting, deriving base and excess risk bounds that depend on graph complexity via the fractional chromatic number and show that, under favorable kernel eigenvalue decay, the fixed point leads to fast rates. They apply the theory to Macro-AUC optimization, transforming multi-label data into multi-task instances, and demonstrate tighter bounds and competitive empirical performance on standard benchmarks. The results advance generalization analysis for dependent data and provide practical guidance for risk control in multi-task systems with graph-structured dependencies.

Abstract

In multi-task learning (MTL) with each task involving graph-dependent data, existing generalization analyses yield a \emph{sub-optimal} risk bound of , where is the number of training samples of each task. However, to improve the risk bound is technically challenging, which is attributed to the lack of a foundational sharper concentration inequality for multi-graph dependent random variables. To fill up this gap, this paper proposes a new Bennett-type inequality, enabling the derivation of a sharper risk bound of . Technically, building on the proposed Bennett-type inequality, we propose a new Talagrand-type inequality for the empirical process, and further develop a new analytical framework of the local fractional Rademacher complexity to enhance generalization analyses in MTL with multi-graph dependent data. Finally, we apply the theoretical advancements to applications such as Macro-AUC optimization, illustrating the superiority of our theoretical results over prior work, which is also verified by experimental results.

Paper Structure

This paper contains 48 sections, 24 theorems, 97 equations, 1 figure, 5 tables.

Key Result

Theorem 1

Assume the function class $\mathcal{F}$ is defined in Eq. eq:function_class_f. Assume that there is some $r>0$ for every $f \in \mathcal{F}$, $\mathrm{var}(f_k) \leq r$, $\max_{k \in {K}} \sup_{{\mathbf{x}}} |f_k({\mathbf{x}})| \leq 1$. Then for every $t > 0$, with probability at least $1-e^{-t}$, where $c=\frac{5^2}{4^2} \sum_k \frac{\chi_f(G_k)}{m_k}$. Moreover, the same results hold for the qu

Figures (1)

  • Figure 1: The proof structure diagram of the main results. The blue node denotes previous results (or assumptions) and others are our contributions.

Theorems & Definitions (57)

  • Definition 1: Fractional independent vertex cover, and fractional chromatic number zhang2022generalization
  • Definition 2: Dependency graph ralaivola2015entropy
  • Definition 3: Fractionally colorable function ralaivola2015entropy
  • Definition 4: Multi-fractionally sub-additive function
  • Definition 5: Local fractional Rademacher complexity (LFRC)
  • Theorem 1: The base bound of MGD with small variance, analog of Thm. 2.1 in Bartlett_2005, proof in Appendix \ref{['pro:core2.1_proof']}
  • Remark
  • Theorem 2: An improved bound of MGD with sub-root function, proof in Appendix \ref{['pro:theorem3.3_proof']}
  • Remark
  • Corollary 1: An excess risk bound of $\mathcal{A}$ for MTL-MGD, proof in Appendix \ref{['section: A']}
  • ...and 47 more