Learning from Complementary Features

Kosuke Sugiyama; Masato Uchida

Learning from Complementary Features

Kosuke Sugiyama, Masato Uchida

TL;DR

CFL addresses learning when some inputs are CFs that indicate what a feature is not, by deriving an information-theoretic objective that upper-bounds standard supervised loss through $J_{KL}$ and $J_{MI}$. The method first estimates CFs' exact values via a graph-based confidence propagation scheme (with a hypothetical self-referenced setting) and then trains the label predictor on these estimates, using practical approximations such as margin-based confidences and $k$-nearest-neighbor weight estimation to maintain tractability. Empirical results on Bank Marketing and Adult datasets show that the proposed approach improves CF-value estimation and downstream prediction in many cases, especially for CFs with few unique values, and provide guidance on when to apply soft vs hard CF estimates and how to select CFs. The work advances learning under privacy or cost constraints by effectively leveraging complementary information, with implications for interpretable decision-making and robust predictive modeling in domains with restricted feature observability.

Abstract

While precise data observation is essential for the learning processes of predictive models, it can be challenging owing to factors such as insufficient observation accuracy, high collection costs, and privacy constraints. In this paper, we examines cases where some qualitative features are unavailable as precise information indicating "what it is," but rather as complementary information indicating "what it is not." We refer to features defined by precise information as ordinary features (OFs) and those defined by complementary information as complementary features (CFs). We then formulate a new learning scenario termed Complementary Feature Learning (CFL), where predictive models are constructed using instances consisting of OFs and CFs. The simplest formalization of CFL applies conventional supervised learning directly using the observed values of CFs. However, this approach does not resolve the ambiguity associated with CFs, making learning challenging and complicating the interpretation of the predictive model's specific predictions. Therefore, we derive an objective function from an information-theoretic perspective to estimate the OF values corresponding to CFs and to predict output labels based on these estimations. Based on this objective function, we propose a theoretically guaranteed graph-based estimation method along with its practical approximation, for estimating OF values corresponding to CFs. The results of numerical experiments conducted with real-world data demonstrate that our proposed method effectively estimates OF values corresponding to CFs and predicts output labels.

Learning from Complementary Features

TL;DR

CFL addresses learning when some inputs are CFs that indicate what a feature is not, by deriving an information-theoretic objective that upper-bounds standard supervised loss through

and

. The method first estimates CFs' exact values via a graph-based confidence propagation scheme (with a hypothetical self-referenced setting) and then trains the label predictor on these estimates, using practical approximations such as margin-based confidences and

-nearest-neighbor weight estimation to maintain tractability. Empirical results on Bank Marketing and Adult datasets show that the proposed approach improves CF-value estimation and downstream prediction in many cases, especially for CFs with few unique values, and provide guidance on when to apply soft vs hard CF estimates and how to select CFs. The work advances learning under privacy or cost constraints by effectively leveraging complementary information, with implications for interpretable decision-making and robust predictive modeling in domains with restricted feature observability.

Abstract

Paper Structure (20 sections, 5 theorems, 32 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 20 sections, 5 theorems, 32 equations, 6 figures, 7 tables, 1 algorithm.

Introduction
Related Works
Formulation of CFL
Notation
Derivation of the Objective Function
Proposed Method
Retrieving the Exact values of CFs Under Hypothetical Setting
Practical Approximation of Weight Matrix Optimization
Practical Approximation of Confidence Propagation
Additional Tactics
Numerical experiments
Experimental Settings
Evaluation of Estimation Quality for CF
Evaluation of Prediction Quality for the Target Feature
Sensitivity Analysis
...and 5 more sections

Key Result

Theorem 1

Here, $\mathbb{I}(\cdot|\cdot)$ represents the conditional mutual information. $\mathbb{I}(Y,\bm{X}^{c}|\bm{X}^{o})$ and $\mathbb{I}(Y, \widehat{\bm{X}}^c | \bm{X}^{o})$ are defined as follows: Here, $p_{*,\bm{\eta}}(y, \hat{\bm{x}}^{c}|\bm{x}^{o}) = \mathbb{E}_{\bar{p}(\bar{\bm{x}}^{c}|\bm{x}^{c}) p_*(\bm{x}^{c}|\bm{x}^{o})}[ p_*(y|\bm{x}^{c},\bm{x}^{o}, \bar{\bm{x}}^{c}) \\ q_{\bm{\eta}}(\hat{\

Figures (6)

Figure 1: Example of CFL problem setting. The problem involves predicting a binary output label Default based on four features: Age, Married, Job, and Income. Due to privacy concerns, the OF's values of Job and Income cannot be observed; instad, the CF's values of these features are observed.
Figure 2: Graphical model showing dependencies between variables
Figure 3: The process of evaluating our proposed method.
Figure 4: The relationship between $T$ and the estimation quality of CFs' exact values ($\gamma=0$, with Eq. \ref{['eq:correct_own_comp']}).
Figure 5: The relationship between each of $k$ and $\gamma$ and the estimation quality of CFs' exact values (with Eq. \ref{['eq:correct_own_comp']}, $T=100$, $k=20$, $\gamma=0$ when their values are not varied).
...and 1 more figures

Theorems & Definitions (5)

Theorem 1
Theorem 2
Theorem 3
Corollary 1
Theorem 4

Learning from Complementary Features

TL;DR

Abstract

Learning from Complementary Features

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (5)