Collaborative LLM Numerical Reasoning with Local Data Protection

Min Zhang; Yuzhe Lu; Yun Zhou; Panpan Xu; Lin Lee Cheong; Chang-Tien Lu; Haozhu Wang

Collaborative LLM Numerical Reasoning with Local Data Protection

Min Zhang, Yuzhe Lu, Yun Zhou, Panpan Xu, Lin Lee Cheong, Chang-Tien Lu, Haozhu Wang

TL;DR

This work presents a privacy-preserving framework for numerical reasoning over documents by combining topic-shifted, pattern-preserving query synthesis with a plug-and-play, tool-based answer reconstruction that reuses remote-model code. By transforming queries to protect sensitive content yet preserving the underlying reasoning skeleton, and by substituting original numerical values after remote solving, the approach leverages remote computation without exposing local data. Empirical results on FinQA and MultiHiertt show substantial improvements in local reasoning accuracy (up to 43.6% gains) while reducing data leakage (up to 44.6%), approaching remote-model performance with strong privacy guarantees. The method generalizes across datasets and local retrievers, offering a practical path for secure, on-device numerical reasoning in real-world applications.

Abstract

Numerical reasoning over documents, which demands both contextual understanding and logical inference, is challenging for low-capacity local models deployed on computation-constrained devices. Although such complex reasoning queries could be routed to powerful remote models like GPT-4, exposing local data raises significant data leakage concerns. Existing mitigation methods generate problem descriptions or examples for remote assistance. However, the inherent complexity of numerical reasoning hinders the local model from generating logically equivalent queries and accurately inferring answers with remote guidance. In this paper, we present a model collaboration framework with two key innovations: (1) a context-aware synthesis strategy that shifts the query topics while preserving reasoning patterns; and (2) a tool-based answer reconstruction approach that reuses the remote-generated plug-and-play solution with code snippets. Experimental results demonstrate that our method achieves better reasoning accuracy than solely using local models while providing stronger data protection than fully relying on remote models. Furthermore, our method improves accuracy by 16.2% - 43.6% while reducing data leakage by 2.3% - 44.6% compared to existing data protection approaches.

Collaborative LLM Numerical Reasoning with Local Data Protection

TL;DR

Abstract

Collaborative LLM Numerical Reasoning with Local Data Protection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)