Table of Contents
Fetching ...

Towards Effective Top-N Hamming Search via Bipartite Graph Contrastive Hashing

Yankai Chen, Yixiang Fang, Yifei Zhang, Chenhao Ma, Yang Hong, Irwin King

TL;DR

This work addresses efficient Top-N search on bipartite graphs by learning binarized hash codes through Graph Convolutional Hashing. It introduces BGCH+, a framework that combines adaptive graph convolutional hashing, dual feature contrastive learning, and Fourier gradient estimation to produce robust, expressive hash codes suitable for fast Hamming-space retrieval. Empirical results on six real-world datasets show BGCH+ outperforms hashing baselines and approaches, or matches, full-precision models while delivering substantial speedups and memory savings, especially in sparse graphs. The approach thus offers a scalable, practical solution for large-scale bipartite graph retrieval tasks.

Abstract

Searching on bipartite graphs serves as a fundamental task for various real-world applications, such as recommendation systems, database retrieval, and document querying. Conventional approaches rely on similarity matching in continuous Euclidean space of vectorized node embeddings. To handle intensive similarity computation efficiently, hashing techniques for graph-structured data have emerged as a prominent research direction. However, despite the retrieval efficiency in Hamming space, previous studies have encountered catastrophic performance decay. To address this challenge, we investigate the problem of hashing with Graph Convolutional Network for effective Top-N search. Our findings indicate the learning effectiveness of incorporating hashing techniques within the exploration of bipartite graph reception fields, as opposed to simply treating hashing as post-processing to output embeddings. To further enhance the model performance, we advance upon these findings and propose Bipartite Graph Contrastive Hashing (BGCH+). BGCH+ introduces a novel dual augmentation approach to both intermediate information and hash code outputs in the latent feature spaces, thereby producing more expressive and robust hash codes within a dual self-supervised learning paradigm. Comprehensive empirical analyses on six real-world benchmarks validate the effectiveness of our dual feature contrastive learning in boosting the performance of BGCH+ compared to existing approaches.

Towards Effective Top-N Hamming Search via Bipartite Graph Contrastive Hashing

TL;DR

This work addresses efficient Top-N search on bipartite graphs by learning binarized hash codes through Graph Convolutional Hashing. It introduces BGCH+, a framework that combines adaptive graph convolutional hashing, dual feature contrastive learning, and Fourier gradient estimation to produce robust, expressive hash codes suitable for fast Hamming-space retrieval. Empirical results on six real-world datasets show BGCH+ outperforms hashing baselines and approaches, or matches, full-precision models while delivering substantial speedups and memory savings, especially in sparse graphs. The approach thus offers a scalable, practical solution for large-scale bipartite graph retrieval tasks.

Abstract

Searching on bipartite graphs serves as a fundamental task for various real-world applications, such as recommendation systems, database retrieval, and document querying. Conventional approaches rely on similarity matching in continuous Euclidean space of vectorized node embeddings. To handle intensive similarity computation efficiently, hashing techniques for graph-structured data have emerged as a prominent research direction. However, despite the retrieval efficiency in Hamming space, previous studies have encountered catastrophic performance decay. To address this challenge, we investigate the problem of hashing with Graph Convolutional Network for effective Top-N search. Our findings indicate the learning effectiveness of incorporating hashing techniques within the exploration of bipartite graph reception fields, as opposed to simply treating hashing as post-processing to output embeddings. To further enhance the model performance, we advance upon these findings and propose Bipartite Graph Contrastive Hashing (BGCH+). BGCH+ introduces a novel dual augmentation approach to both intermediate information and hash code outputs in the latent feature spaces, thereby producing more expressive and robust hash codes within a dual self-supervised learning paradigm. Comprehensive empirical analyses on six real-world benchmarks validate the effectiveness of our dual feature contrastive learning in boosting the performance of BGCH+ compared to existing approaches.
Paper Structure (24 sections, 1 theorem, 22 equations, 8 figures, 11 tables, 1 algorithm)

This paper contains 24 sections, 1 theorem, 22 equations, 8 figures, 11 tables, 1 algorithm.

Key Result

Theorem 1

Given two hash codes, we have $(\alpha_x\boldsymbol{Q}_x)^\mathsf{T} \cdot (\alpha_y\boldsymbol{Q}_y)$$=$$\alpha_x\alpha_y$$(d - 2D_{H}(\boldsymbol{Q}_x, \boldsymbol{Q}_y))$.

Figures (8)

  • Figure 1: Bipartite graph modeling and overall model performance in terms of evaluation metrics and embedding distribution visualization (best view in color).
  • Figure 2: Workflow illustration of BGCH+ framework (best view in color).
  • Figure 3: Dual feature augmentation for contrastive learning.
  • Figure 4: Top-N retrieval quality with N in {20, 50, 100, 200, 500, 1000} (best view in color).
  • Figure 5: Distribution illustration of learned hash codes between BGCH and BGCH+ using Gaussian kernel density estimation (KDE) over six datasets with (1) bandwidth as 0.1 and (2) number of contour levels as 10 (best view in color).
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1: Hamming Distance Matching
  • proof