Table of Contents
Fetching ...

Graph Regularized Encoder Training for Extreme Classification

Anshul Mittal, Shikhar Mohan, Deepak Saini, Siddarth Asokan, Suchith C. Prabhu, Lakshya Kumar, Pankaj Malhotra, Jain jiao, Amit Singh, Sumeet Agarwal, Soumen Chakrabarti, Purushottam Kar, Manik Varma

TL;DR

It is formally established that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures and an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs.

Abstract

Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) present a convenient but computationally expensive way to leverage task metadata and enhance model accuracies in these settings. This paper formally establishes that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures. The paper notices that in these settings, it is much more effective to use graph data to regularize encoder training than to implement a GCN. Based on these insights, an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs. RAMEN scales to datasets with up to 1M labels and offers prediction accuracy up to 15% higher on benchmark datasets than state of the art methods, including those that use graph metadata to train GCNs. RAMEN also offers 10% higher accuracy over the best baseline on a proprietary recommendation dataset sourced from click logs of a popular search engine. Code for RAMEN will be released publicly.

Graph Regularized Encoder Training for Extreme Classification

TL;DR

It is formally established that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures and an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs.

Abstract

Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) present a convenient but computationally expensive way to leverage task metadata and enhance model accuracies in these settings. This paper formally establishes that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures. The paper notices that in these settings, it is much more effective to use graph data to regularize encoder training than to implement a GCN. Based on these insights, an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs. RAMEN scales to datasets with up to 1M labels and offers prediction accuracy up to 15% higher on benchmark datasets than state of the art methods, including those that use graph metadata to train GCNs. RAMEN also offers 10% higher accuracy over the best baseline on a proprietary recommendation dataset sourced from click logs of a popular search engine. Code for RAMEN will be released publicly.
Paper Structure (10 sections, 3 equations, 3 figures, 10 tables)

This paper contains 10 sections, 3 equations, 3 figures, 10 tables.

Figures (3)

  • Figure 1: A snapshot from LF-WikiSeeAlsoTitles-320K dataset for the article on "Cladistics." The related article "Common descent" is tagged but the ground truth is missing the link to "Crown group". Traversal on the hyperlink edges can help discover missing link but can also lead to irrelevant link such as "Vestigial organs".
  • Figure 2: RAMEN uses graph metadata to regularize encoder during training. Training framework of RAMEN is robust to noise in graph. Here $\cR_x$ and $\cR_y$ are regularization loss terms and $\cL$ is the task based loss term.
  • Figure 3: RAMEN does not require additional information to compute an accurate representation of the test point. In OAK (an instance of a GCN), inference is a computationally expensive two-stage pipeline, where the test point is first embedded into the graph and then, the linked nodes are used to compute the final representation (induced graph). Since RAMEN uses the graph for regularization, and does not traverse the graph at inference time, it can be 2$\times$ faster, and 5% more accurate, than OAK.