Sharper Risk Bound for Multi-Task Learning with Multi-Graph Dependent Data
Xiao Shao, Guoqiang Wu
TL;DR
This work tackles sharp generalization in multi-task learning when per-task data exhibit multi-graph dependencies, where prior results yield only $O\left(\frac{1}{\sqrt{n}}\right)$ bounds. It introduces a Bennett-type concentration inequality for multi-graph dependent variables and couples it with a new Talagrand-type inequality for empirical processes, forming a local fractional Rademacher complexity (LFRC) framework that yields fast $O\left(\frac{\log n}{n}\right)$ risk bounds. The authors instantiate these tools into the MTL-MGD setting, deriving base and excess risk bounds that depend on graph complexity via the fractional chromatic number $\chi_f(G_k)$ and show that, under favorable kernel eigenvalue decay, the fixed point $r^*$ leads to fast rates. They apply the theory to Macro-AUC optimization, transforming multi-label data into multi-task instances, and demonstrate tighter bounds and competitive empirical performance on standard benchmarks. The results advance generalization analysis for dependent data and provide practical guidance for risk control in multi-task systems with graph-structured dependencies.
Abstract
In multi-task learning (MTL) with each task involving graph-dependent data, existing generalization analyses yield a \emph{sub-optimal} risk bound of $O(\frac{1}{\sqrt{n}})$, where $n$ is the number of training samples of each task. However, to improve the risk bound is technically challenging, which is attributed to the lack of a foundational sharper concentration inequality for multi-graph dependent random variables. To fill up this gap, this paper proposes a new Bennett-type inequality, enabling the derivation of a sharper risk bound of $O(\frac{\log n}{n})$. Technically, building on the proposed Bennett-type inequality, we propose a new Talagrand-type inequality for the empirical process, and further develop a new analytical framework of the local fractional Rademacher complexity to enhance generalization analyses in MTL with multi-graph dependent data. Finally, we apply the theoretical advancements to applications such as Macro-AUC optimization, illustrating the superiority of our theoretical results over prior work, which is also verified by experimental results.
