Table of Contents
Fetching ...

Adjusted Count Quantification Learning on Graphs

Clemens Damke, Eyke Hüllermeier

TL;DR

This work tackles graph quantification learning under distribution shift, where prior-probability shift assumptions often fail. It introduces Structural Importance Sampling (SIS) to account for (structural) covariate shift by reweighting training samples through density-ratio estimates on graph vertices, and Neighborhood-aware Adjusted Count (NACC) to improve class identifiability using 1-hop neighbor information. Quantification is performed by adjusting the predicted prevalences via a confuson-matrix framework with a constrained optimization on the simplex, and in graphs this is augmented by SIS and NACC. Empirical results on five graph benchmarks and a real-world Twitch Gamers dataset show that SIS (and its combination with NACC) consistently outperforms baselines under both synthetic and real-world shifts, highlighting the importance of modeling covariate shift in graph quantification and suggesting directions toward distribution-matching quantifiers and graph-quantification benchmarks.

Abstract

Quantification learning is the task of predicting the label distribution of a set of instances. We study this problem in the context of graph-structured data, where the instances are vertices. Previously, this problem has only been addressed via node clustering methods. In this paper, we extend the popular Adjusted Classify & Count (ACC) method to graphs. We show that the prior probability shift assumption upon which ACC relies is often not applicable to graph quantification problems. To address this issue, we propose structural importance sampling (SIS), the first graph quantification method that is applicable under (structural) covariate shift. Additionally, we propose Neighborhood-aware ACC, which improves quantification in the presence of non-homophilic edges. We show the effectiveness of our techniques on multiple graph quantification tasks.

Adjusted Count Quantification Learning on Graphs

TL;DR

This work tackles graph quantification learning under distribution shift, where prior-probability shift assumptions often fail. It introduces Structural Importance Sampling (SIS) to account for (structural) covariate shift by reweighting training samples through density-ratio estimates on graph vertices, and Neighborhood-aware Adjusted Count (NACC) to improve class identifiability using 1-hop neighbor information. Quantification is performed by adjusting the predicted prevalences via a confuson-matrix framework with a constrained optimization on the simplex, and in graphs this is augmented by SIS and NACC. Empirical results on five graph benchmarks and a real-world Twitch Gamers dataset show that SIS (and its combination with NACC) consistently outperforms baselines under both synthetic and real-world shifts, highlighting the importance of modeling covariate shift in graph quantification and suggesting directions toward distribution-matching quantifiers and graph-quantification benchmarks.

Abstract

Quantification learning is the task of predicting the label distribution of a set of instances. We study this problem in the context of graph-structured data, where the instances are vertices. Previously, this problem has only been addressed via node clustering methods. In this paper, we extend the popular Adjusted Classify & Count (ACC) method to graphs. We show that the prior probability shift assumption upon which ACC relies is often not applicable to graph quantification problems. To address this issue, we propose structural importance sampling (SIS), the first graph quantification method that is applicable under (structural) covariate shift. Additionally, we propose Neighborhood-aware ACC, which improves quantification in the presence of non-homophilic edges. We show the effectiveness of our techniques on multiple graph quantification tasks.

Paper Structure

This paper contains 29 sections, 20 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The Amazon Photos co-purchase graph. Colors indicate vertex labels ($K = 8$). The highlighted vertices are misclassifications by an *appnp classifier.
  • Figure 2: Visualization of the "Twitch Gamers" dataset rozemberczki2021. Vertices represent Twitch users, edges represent follower relationships. Colors indicate the primary language of each user.
  • Figure 3: Quantification performance of sis (with nacc) with the ppr kernel for different values of $\lambda$.
  • Figure 4: Quantification performance of sis (with nacc) with the shortest-path kernel for different values of $\gamma$.