Table of Contents
Fetching ...

The Effectiveness of Graph Contrastive Learning on Mathematical Information Retrieval

Pei-Syuan Wang, Hung-Hsuan Chen

TL;DR

This work reframes mathematical information retrieval as learning structure-aware representations of formulas via graph contrastive learning (GCL). By constructing two graph layouts, Symbol Layout Tree (SLT) and Operator Tree (OPT), and evaluating three GCL methods (InfoGraph, GraphCL, BGRL) in a self-supervised setting, the study demonstrates consistent improvements over the strong TangentCFT baseline on the NTCIR-12 MathIR Wikipedia Formula Browsing Task, using cosine-based retrieval. Key findings include layout-dependent performance of GCL variants and the value of using TangentCFT node embeddings as input to the GCL framework. The work provides a public codebase to promote further research and development in structure-aware MIR for mathematical formulas, with potential extensions to data augmentation and positive-pair generation via templates.

Abstract

This paper details an empirical investigation into using Graph Contrastive Learning (GCL) to generate mathematical equation representations, a critical aspect of Mathematical Information Retrieval (MIR). Our findings reveal that this simple approach consistently exceeds the performance of the current leading formula retrieval model, TangentCFT. To support ongoing research and development in this field, we have made our source code accessible to the public at https://github.com/WangPeiSyuan/GCL-Formula-Retrieval/.

The Effectiveness of Graph Contrastive Learning on Mathematical Information Retrieval

TL;DR

This work reframes mathematical information retrieval as learning structure-aware representations of formulas via graph contrastive learning (GCL). By constructing two graph layouts, Symbol Layout Tree (SLT) and Operator Tree (OPT), and evaluating three GCL methods (InfoGraph, GraphCL, BGRL) in a self-supervised setting, the study demonstrates consistent improvements over the strong TangentCFT baseline on the NTCIR-12 MathIR Wikipedia Formula Browsing Task, using cosine-based retrieval. Key findings include layout-dependent performance of GCL variants and the value of using TangentCFT node embeddings as input to the GCL framework. The work provides a public codebase to promote further research and development in structure-aware MIR for mathematical formulas, with potential extensions to data augmentation and positive-pair generation via templates.

Abstract

This paper details an empirical investigation into using Graph Contrastive Learning (GCL) to generate mathematical equation representations, a critical aspect of Mathematical Information Retrieval (MIR). Our findings reveal that this simple approach consistently exceeds the performance of the current leading formula retrieval model, TangentCFT. To support ongoing research and development in this field, we have made our source code accessible to the public at https://github.com/WangPeiSyuan/GCL-Formula-Retrieval/.
Paper Structure (17 sections, 3 equations, 2 figures, 7 tables)

This paper contains 17 sections, 3 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: The online and offline processing of the entire framework
  • Figure 2: The examples of the SLT and OPT representations of the formula $a^3 + b^2 = 0$