A Consistent Lebesgue Measure for Multi-label Learning
Kaan Demir, Bach Nguyen, Bing Xue, Mengjie Zhang
TL;DR
The paper tackles the difficulty of inconsistent guidance from multiple, non-differentiable multi-label losses by introducing CLML, a Consistent Lebesgue Measure-based learner that directly optimises a Lebesgue-measure objective across several losses. It provides a theoretical consistency result under Bayes risk and demonstrates practical state-of-the-art performance on nine datasets using a simple feedforward architecture, without relying on label graphs or perturbation-based conditioning. The approach relies on Monte Carlo estimation of Lebesgue contributions and CMA-ES optimization to navigate the non-convex, non-differentiable loss landscape, yielding robust improvements and revealing insights about surrogate-versus-desired loss dynamics. The work highlights the importance of optimization consistency in multi-label learning and offers a scalable, surrogate-free path toward balancing multiple, potentially conflicting objectives with tangible empirical benefits.
Abstract
Multi-label loss functions are usually non-differentiable, requiring surrogate loss functions for gradient-based optimisation. The consistency of surrogate loss functions is not proven and is exacerbated by the conflicting nature of multi-label loss functions. To directly learn from multiple related, yet potentially conflicting multi-label loss functions, we propose a Consistent Lebesgue Measure-based Multi-label Learner (CLML) and prove that CLML can achieve theoretical consistency under a Bayes risk framework. Empirical evidence supports our theory by demonstrating that: (1) CLML can consistently achieve state-of-the-art results; (2) the primary performance factor is the Lebesgue measure design, as CLML optimises a simpler feedforward model without additional label graph, perturbation-based conditioning, or semantic embeddings; and (3) an analysis of the results not only distinguishes CLML's effectiveness but also highlights inconsistencies between the surrogate and the desired loss functions.
