Table of Contents
Fetching ...

Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks

Chunyang Jiang, Yonggang Zhang, Yiyang Cai, Chi-Min Chan, Yulong Liu, Mingming Chen, Wei Xue, Yike Guo

Abstract

The rising cost of acquiring supervised data has driven significant interest in self-improvement for large language models (LLMs). Straightforward unsupervised signals like majority voting have proven effective in generating pseudo-labels for verifiable tasks, while their applicability to unverifiable tasks (e.g., translation) is limited by the open-ended character of responses. As a result, self-evaluation mechanisms (e.g., self-judging and entropy minimization) are predominantly used to derive pseudo-labels. However, self-evaluation relying on LLMs typically incurs high computational overhead and introduces overconfidence issues due to intrinsic biases. To address these challenges, we propose a novel self-evaluation-free approach for unverifiable tasks, designed for lightweight yet effective self-improvement. Inspired by majority voting commonly employed in verifiable tasks, we propose semantic voting as a novel mechanism that relaxes the principle of hard matching (i.e., exact matching) toward soft matching (i.e., semantic similarity). Soft matching is achieved by leveraging a lightweight sentence embedding model to quantify semantic similarity, thereby mitigating excessive computational burden and intrinsic bias-associated limitations of self-evaluation. Comprehensive experiments demonstrate that our method achieves substantial gains in computational efficiency and overall better performance than self-evaluation methods across diverse model architectures and tasks.

Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks

Abstract

The rising cost of acquiring supervised data has driven significant interest in self-improvement for large language models (LLMs). Straightforward unsupervised signals like majority voting have proven effective in generating pseudo-labels for verifiable tasks, while their applicability to unverifiable tasks (e.g., translation) is limited by the open-ended character of responses. As a result, self-evaluation mechanisms (e.g., self-judging and entropy minimization) are predominantly used to derive pseudo-labels. However, self-evaluation relying on LLMs typically incurs high computational overhead and introduces overconfidence issues due to intrinsic biases. To address these challenges, we propose a novel self-evaluation-free approach for unverifiable tasks, designed for lightweight yet effective self-improvement. Inspired by majority voting commonly employed in verifiable tasks, we propose semantic voting as a novel mechanism that relaxes the principle of hard matching (i.e., exact matching) toward soft matching (i.e., semantic similarity). Soft matching is achieved by leveraging a lightweight sentence embedding model to quantify semantic similarity, thereby mitigating excessive computational burden and intrinsic bias-associated limitations of self-evaluation. Comprehensive experiments demonstrate that our method achieves substantial gains in computational efficiency and overall better performance than self-evaluation methods across diverse model architectures and tasks.

Paper Structure

This paper contains 32 sections, 5 equations, 14 figures, 8 tables, 1 algorithm.

Figures (14)

  • Figure 1: Overview of semantic voting-based self-improvement (SVSI) for LLMs. Given an input question $x_i$, SVSI first generates a candidate answer set $\mathcal{A}_i$, clustering it to identify the most coherent (largest) subset $\mathcal{C}_i^{max}$ (Section \ref{['sec:clustering']}), and then performs semantic voting within $\mathcal{C}_i^{max}$ using average semantic similarities (Section \ref{['sec:sv']}). Answers with the highest ($a_i^w$) and lowest ($a_i^l$) voting scores are used for DPO training (Section \ref{['sec:dpo']}).
  • Figure 2: Overall lexical and semantic improvements of SVSI, EM, and SJ across all settings, relative to the base model performance.
  • Figure 3: Computational overhead for building preference pairs via SVSI, EM, and SJ.
  • Figure 4: Improvements of training with Semantic Voting (SV) and flipped-SV (inverted SV preference pairs), relative to the base model on lexical (top row) and semantic (bottom row) metrics.
  • Figure 5: Evaluation of the GRPO-based variant of SVSI (SVSI-G) against EMPO and EMRL-seq.
  • ...and 9 more figures