Table of Contents
Fetching ...

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Xinyan Guan, Yanjiang Liu, Xinyu Lu, Boxi Cao, Ben He, Xianpei Han, Le Sun, Jie Lou, Bowen Yu, Yaojie Lu, Hongyu Lin

TL;DR

<3-5 sentence high-level summary> verifier engineering reframes post-training supervision for foundation models as a three-stage loop (Search-Verify-Feedback) grounded in a GC-MDP, enabling scalable, automated supervision via diverse verifiers. The paper formalizes the framework, surveys a taxonomy of verifiers, and outlines implementation strategies across training- and inference-based feedback, as well as advanced search and verifier design. It connects verifier engineering to existing post-training paradigms, discusses open questions, and highlights challenges in efficiency, evaluation, and verifier fusion. By coordinating search, verification, and feedback, the approach aims to push foundation models toward more general, safer, and capable AI systems.

Abstract

The evolution of machine learning has increasingly prioritized the development of powerful models and more scalable supervision signals. However, the emergence of foundation models presents significant challenges in providing effective supervision signals necessary for further enhancing their capabilities. Consequently, there is an urgent need to explore novel supervision signals and technical approaches. In this paper, we propose verifier engineering, a novel post-training paradigm specifically designed for the era of foundation models. The core of verifier engineering involves leveraging a suite of automated verifiers to perform verification tasks and deliver meaningful feedback to foundation models. We systematically categorize the verifier engineering process into three essential stages: search, verify, and feedback, and provide a comprehensive review of state-of-the-art research developments within each stage. We believe that verifier engineering constitutes a fundamental pathway toward achieving Artificial General Intelligence.

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

TL;DR

<3-5 sentence high-level summary> verifier engineering reframes post-training supervision for foundation models as a three-stage loop (Search-Verify-Feedback) grounded in a GC-MDP, enabling scalable, automated supervision via diverse verifiers. The paper formalizes the framework, surveys a taxonomy of verifiers, and outlines implementation strategies across training- and inference-based feedback, as well as advanced search and verifier design. It connects verifier engineering to existing post-training paradigms, discusses open questions, and highlights challenges in efficiency, evaluation, and verifier fusion. By coordinating search, verification, and feedback, the approach aims to push foundation models toward more general, safer, and capable AI systems.

Abstract

The evolution of machine learning has increasingly prioritized the development of powerful models and more scalable supervision signals. However, the emergence of foundation models presents significant challenges in providing effective supervision signals necessary for further enhancing their capabilities. Consequently, there is an urgent need to explore novel supervision signals and technical approaches. In this paper, we propose verifier engineering, a novel post-training paradigm specifically designed for the era of foundation models. The core of verifier engineering involves leveraging a suite of automated verifiers to perform verification tasks and deliver meaningful feedback to foundation models. We systematically categorize the verifier engineering process into three essential stages: search, verify, and feedback, and provide a comprehensive review of state-of-the-art research developments within each stage. We believe that verifier engineering constitutes a fundamental pathway toward achieving Artificial General Intelligence.

Paper Structure

This paper contains 43 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Framework of verifier engineering: The fundamental stages of verifier engineering include Search, Verify, and Feedback. Given an instruction, the process begins with generating candidate responses (Search), followed by evaluating these candidates using appropriate verifier combinations (Verify), and concludes with optimizing the model's output distribution (Feedback). This framework can explain various approaches, from training-based methods like RLHF ouyang2022training to inference-based techniques such as OmegaPRM luo2024improve and Experiential Co-Learning qian2023experiential. We systematically categorize existing approaches into these three stages in Table \ref{['tab:exist-work']}.
  • Figure 2: Overview of verifier engineering methodologies, categorized into three main stages: Search, Verify, and Feedback. Each stage is further broken down into specific approaches, with references to notable works in each area.
  • Figure 3: A verifier engineering perspective on SFT, DPO, and RLHF: gray nodes represent sample paths not used in training, while non-gray nodes represent sample paths actively used in the training process.