Refine Thought: A Test-Time Inference Method for Embedding Model Reasoning

Guangzhi Wang; Kai Li; Yinghao Jiao; Zhi Liu

Refine Thought: A Test-Time Inference Method for Embedding Model Reasoning

Guangzhi Wang, Kai Li, Yinghao Jiao, Zhi Liu

TL;DR

RT addresses the limited computational depth of text embedding models for semantic reasoning by performing $T$ test-time forward passes on the query, producing a refined representation without any parameter updates. Unlike generation-based chain-of-thought, RT operates entirely in the hidden space, aggregating intermediate states to yield a final embedding $h_T$ for similarity. Across BRIGHT and PJBenchmark, RT delivers notable gains on reasoning tasks while preserving performance on general semantic understanding like C-MTEB STS, with decoder-only architectures showing the strongest benefits. The approach highlights the practical value of temporal unrolling at inference time and points to adaptive stepping and potential lightweight training as promising future directions.

Abstract

We propose RT (Refine Thought), a method that can enhance the semantic rea-soning ability of text embedding models. The method obtains the final semanticrepresentation by running multiple forward passes of the text embedding model.Experiments show that RT achieves significant improvements on semantic reason-ing tasks in BRIGHT and the person job matching benchmark PJBenchmark1, while maintaining consistent performance on general-purpose semantic under-standing tasks such as C-MTEB. Our results indicate that RT is effective becauseit further activates the semantic reasoning ability learned during pretraining bydecoder-only text embedding models(e.g., Qwen3-Embedding-8B). RT canbe seen as a test-time inference method.

Refine Thought: A Test-Time Inference Method for Embedding Model Reasoning

TL;DR

RT addresses the limited computational depth of text embedding models for semantic reasoning by performing

test-time forward passes on the query, producing a refined representation without any parameter updates. Unlike generation-based chain-of-thought, RT operates entirely in the hidden space, aggregating intermediate states to yield a final embedding

for similarity. Across BRIGHT and PJBenchmark, RT delivers notable gains on reasoning tasks while preserving performance on general semantic understanding like C-MTEB STS, with decoder-only architectures showing the strongest benefits. The approach highlights the practical value of temporal unrolling at inference time and points to adaptive stepping and potential lightweight training as promising future directions.

Refine Thought: A Test-Time Inference Method for Embedding Model Reasoning

TL;DR

Abstract

Refine Thought: A Test-Time Inference Method for Embedding Model Reasoning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)