Table of Contents
Fetching ...

Metric Based Few-Shot Graph Classification

Donato Crisostomi, Simone Antonelli, Valentino Maiorca, Luca Moschella, Riccardo Marin, Emanuele Rodolà

TL;DR

This work tackles the problem of few-shot graph classification, showing that equipping a simple distance metric learning baseline with a state-of-the-art graph embedder allows to obtain competitive results on the task and proposes a MixUp-based online data augmentation technique acting in the latent space and shows its effectiveness on the task.

Abstract

Many modern deep-learning techniques do not work without enormous datasets. At the same time, several fields demand methods working in scarcity of data. This problem is even more complex when the samples have varying structures, as in the case of graphs. Graph representation learning techniques have recently proven successful in a variety of domains. Nevertheless, the employed architectures perform miserably when faced with data scarcity. On the other hand, few-shot learning allows employing modern deep learning models in scarce data regimes without waiving their effectiveness. In this work, we tackle the problem of few-shot graph classification, showing that equipping a simple distance metric learning baseline with a state-of-the-art graph embedder allows to obtain competitive results on the task. While the simplicity of the architecture is enough to outperform more complex ones, it also allows straightforward additions. To this end, we show that additional improvements may be obtained by encouraging a task-conditioned embedding space. Finally, we propose a MixUp-based online data augmentation technique acting in the latent space and show its effectiveness on the task.

Metric Based Few-Shot Graph Classification

TL;DR

This work tackles the problem of few-shot graph classification, showing that equipping a simple distance metric learning baseline with a state-of-the-art graph embedder allows to obtain competitive results on the task and proposes a MixUp-based online data augmentation technique acting in the latent space and shows its effectiveness on the task.

Abstract

Many modern deep-learning techniques do not work without enormous datasets. At the same time, several fields demand methods working in scarcity of data. This problem is even more complex when the samples have varying structures, as in the case of graphs. Graph representation learning techniques have recently proven successful in a variety of domains. Nevertheless, the employed architectures perform miserably when faced with data scarcity. On the other hand, few-shot learning allows employing modern deep learning models in scarce data regimes without waiving their effectiveness. In this work, we tackle the problem of few-shot graph classification, showing that equipping a simple distance metric learning baseline with a state-of-the-art graph embedder allows to obtain competitive results on the task. While the simplicity of the architecture is enough to outperform more complex ones, it also allows straightforward additions. To this end, we show that additional improvements may be obtained by encouraging a task-conditioned embedding space. Finally, we propose a MixUp-based online data augmentation technique acting in the latent space and show its effectiveness on the task.
Paper Structure (37 sections, 7 equations, 9 figures, 9 tables, 3 algorithms)

This paper contains 37 sections, 7 equations, 9 figures, 9 tables, 3 algorithms.

Figures (9)

  • Figure 1: An $N$-way $K$-shot episode. In this example, there are $N=3$ classes. Each class has $K=4$ supports yielding a support set with size $N*K=12$. The class information provided by the supports is exploited to classify the queries. We test the classification accuracy on all $N$ classes. In this figure there are $Q=2$ queries for each class, thus the query set has size $N*Q = 6$.
  • Figure 2: Prototypical Networks architecture. A graph encoder embeds the supports graphs, the embeddings that belong to the same class are averaged to obtain the class prototype $p$. To classify a query graph $q$, it is embedded in the same space of the supports. The distances in the latent space between the query and the prototypes determine the similarities and thus the probability distribution of the query among the different classes, computed as in \ref{['eq:proto-class-distr']}.
  • Figure 3: Mixup procedure. Each graph is embedded into a latent representation. We generate a random boolean mask $\boldsymbol{\sigma}$ and its complementary $\mathbf{1} - \boldsymbol{\sigma}$, which describe the features to select from $\mathbf{s}_1$ and $\mathbf{s}_2$. The selected features are then recomposed to generated the novel latent vector $\tilde{\mathbf{s}}$.
  • Figure 4: Visualization of latent spaces from the COIL-DEL dataset, through T-SNE dimensionality reduction. Each row is a different episode, the colors represent novel classes, the crosses are the queries, the circles are the supports and the stars are the prototypes. The left column is produced with the base model PN, the middle one with the PN+TAE model, the right one with the full model PN+TAE+MU. This comparison shows the TAE and MU regularizations improve the class separation in the latent space, with MU proving essential to obtain accurate latent clusters.
  • Figure 5: Visualization of novel episodes' latent spaces from the Graph-R52 dataset, through T-SNE dimensionality reduction. Each row is a different episode, the colors represent novel classes, the crosses are the queries, the circles are the supports and the stars are the prototypes. The left column is produced with the base model PN, the middle one with the PN+TAE model, the right one with the full model PN+TAE+MU. This comparison shows that the TAE and MU regularizations improve the class separation in the latent space, although less remarkably than in COIL-DEL.
  • ...and 4 more figures