Interleaving Text and Number Embeddings to Solve Mathemathics Problems

Marvin Alberts; Gianmarco Gabrieli; Irina Espejo Morales

Interleaving Text and Number Embeddings to Solve Mathemathics Problems

Marvin Alberts, Gianmarco Gabrieli, Irina Espejo Morales

TL;DR

This paper addresses key shortcomings, including the elimination of numerical artefacts and the ability to handle a wide range of magnitudes without clipping, and introduces a routing layer that differentiates between numerical and text embeddings.

Abstract

Integrating text and numbers effectively is a crucial step towards enhancing Large Language Models (LLMs) capabilities in assisting in scientific tasks. While most current approaches rely on discrete tokenization of numbers, for instance, conversion to scientific notation or base 10-decomposition, a recent approach proposed a continuous numerical encoding as an inductive bias. In this paper, we build upon this approach by introducing more expressive numerical embeddings. Our method addresses key shortcomings, including the elimination of numerical artefacts and the ability to handle a wide range of magnitudes without clipping. Our work presents two key contributions. First, we employ an MLP to assign distinct directions in the embedding space to different numbers. Our second contribution is the introduction of a routing layer that differentiates between numerical and text embeddings. We hypothesise that this combined approach enables the model to distinguish between text and number distributions while maintaining its capacity for arithmetic operations. Using only a 45 M parameter encoder-decoder architecture our method achieves a $R^2$=0.9988 over a wide range of magnitude ($10^{-3},10^{8}$). In addition, we empirically observe a reduction of the numerical artefacts and biases observed compared to the baselines.

Interleaving Text and Number Embeddings to Solve Mathemathics Problems

TL;DR

Abstract

Interleaving Text and Number Embeddings to Solve Mathemathics Problems

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)