Rethinking Knowledge Transfer in Learning Using Privileged Information
Danil Provodin, Bram van den Akker, Christina Katsimerou, Maurits Kaptein, Mykola Pechenizkiy
TL;DR
This work critically reevaluates learning with privileged information (PI) by examining the theoretical underpinnings and empirical claims of knowledge transfer in LUPI. It analyzes two main PI-transfer mechanisms—knowledge distillation and TRAM-like marginalization—showing that strong assumptions and dataset-specific conditions often drive reported gains, not PI itself; extensive experiments reveal no robust PI transfer across multiple real-world datasets. The authors demonstrate that improvements frequently arise from training dynamics or architectural changes rather than PI, and that claims of faster learning rates or sample efficiency under PI are not generally supported. They call for cautious adoption of PI, urging the development of rigorous theoretical and empirical criteria to demonstrate genuine PI-induced transfer before applying LUPI in practice.
Abstract
In supervised machine learning, privileged information (PI) is information that is unavailable at inference, but is accessible during training time. Research on learning using privileged information (LUPI) aims to transfer the knowledge captured in PI onto a model that can perform inference without PI. It seems that this extra bit of information ought to make the resulting model better. However, finding conclusive theoretical or empirical evidence that supports the ability to transfer knowledge using PI has been challenging. In this paper, we critically examine the assumptions underlying existing theoretical analyses and argue that there is little theoretical justification for when LUPI should work. We analyze LUPI methods and reveal that apparent improvements in empirical risk of existing research may not directly result from PI. Instead, these improvements often stem from dataset anomalies or modifications in model design misguidedly attributed to PI. Our experiments for a wide variety of application domains further demonstrate that state-of-the-art LUPI approaches fail to effectively transfer knowledge from PI. Thus, we advocate for practitioners to exercise caution when working with PI to avoid unintended inductive biases.
