Sparse joint shift in multinomial classification

Dirk Tasche

Sparse joint shift in multinomial classification

Dirk Tasche

TL;DR

Sparse joint shift (SJS) reframes dataset shift by allowing both labels and a subset of features to change while keeping the remaining feature-conditionals invariant. The paper develops density-based characterizations of SJS, proves identifiability under a rank condition, and clarifies its relationship with covariate shift via conditional distribution invariance (CDI). It then proposes two KL-based and discrete estimation strategies to recover target-shift weights from labeled source and unlabeled target data, discusses potential inconsistencies, and suggests improvements, including classifier-augmented approaches to handle high dimensionality. Collectively, these results advance principled domain adaptation under SJS and provide practical guidance for estimating the shift and correcting posteriors when target labels are unavailable.

Abstract

Sparse joint shift (SJS) was recently proposed as a tractable model for general dataset shift which may cause changes to the marginal distributions of features and labels as well as the posterior probabilities and the class-conditional feature distributions. Fitting SJS for a target dataset without label observations may produce valid predictions of labels and estimates of class prior probabilities. We present new results on the transmission of SJS from sets of features to larger sets of features, a conditional correction formula for the class posterior probabilities under the target distribution, identifiability of SJS, and the relationship between SJS and covariate shift. In addition, we point out inconsistencies in the algorithms which were proposed for estimating the characteristics of SJS, as they could hamper the search for optimal solutions, and suggest potential improvements.

Sparse joint shift in multinomial classification

TL;DR

Abstract

Paper Structure (13 sections, 12 theorems, 80 equations)

This paper contains 13 sections, 12 theorems, 80 equations.

Setting
Analyses of sparse joint shift
Characterising SJS
Identifiability in the presence of sparse joint shift
SJS vs. covariate shift
How to estimate sparse joint shift?
The general approach
Minimising the Kullback-Leibler divergence
Second estimation strategy
Conclusions
Some concepts and notation from probability theory
Proofs
Estimating sparse joint shift when all features are discrete

Key Result

Proposition 1.8

Under Assumption as:cont, suppose that $\mathcal{F}$ is a sub-$\sigma$-algebra of $\mathcal{H}$. Then the following three statements are equivalent:

Theorems & Definitions (38)

Definition 1.2
Definition 1.3
Definition 1.4: Sparse Joint Shift
Remark 1.5
Remark 1.6
Proposition 1.8
Theorem 2.1: Equivalent conditions for SJS
Lemma 2.2
proof : Proof of Lemma \ref{['le:measurable']}
proof : Proof of Theorem \ref{['th:eqSJS']}
...and 28 more

Sparse joint shift in multinomial classification

TL;DR

Abstract

Sparse joint shift in multinomial classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (38)