Table of Contents
Fetching ...

LOGIGEN: Logic-Driven Generation of Verifiable Agentic Tasks

Yucheng Zeng, Weipeng Lu, Linyun Liu, Shupeng Li, Zitian Qu, Chenghao Zhu, Shaofei Li, Zhengdong Tan, Mengyue Liu, Haotian Zhao, Zhe Zhou, Jianmin Wu

TL;DR

LOGIGEN is introduced, a logic-driven framework that synthesizes verifiable training data based on three core pillars: Hard-Compiled Policy Grounding, Logic-Driven Forward Synthesis, and Deterministic State Verification, and a verification-based training protocol where Supervised Fine-Tuning on verifiable trajectories establishes compliance with hard-compiled policy.

Abstract

The evolution of Large Language Models (LLMs) from static instruction-followers to autonomous agents necessitates operating within complex, stateful environments to achieve precise state-transition objectives. However, this paradigm is bottlenecked by data scarcity, as existing tool-centric reverse-synthesis pipelines fail to capture the rigorous logic of real-world applications. We introduce \textbf{LOGIGEN}, a logic-driven framework that synthesizes verifiable training data based on three core pillars: \textbf{Hard-Compiled Policy Grounding}, \textbf{Logic-Driven Forward Synthesis}, and \textbf{Deterministic State Verification}. Specifically, a Triple-Agent Orchestration is employed: the \textbf{Architect} compiles natural-language policy into database constraints to enforce hard rules; the \textbf{Set Designer} initializes boundary-adjacent states to trigger critical policy conflicts; and the \textbf{Explorer} searches this environment to discover causal solution paths. This framework yields a dataset of 20,000 complex tasks across 8 domains, where validity is strictly guaranteed by checking exact state equivalence. Furthermore, we propose a verification-based training protocol where Supervised Fine-Tuning (SFT) on verifiable trajectories establishes compliance with hard-compiled policy, while Reinforcement Learning (RL) guided by dense state-rewards refines long-horizon goal achievement. On $τ^2$-Bench, LOGIGEN-32B(RL) achieves a \textbf{79.5\% success rate}, substantially outperforming the base model (40.7\%). These results demonstrate that logic-driven synthesis combined with verification-based training effectively constructs the causally valid trajectories needed for next-generation agents.

LOGIGEN: Logic-Driven Generation of Verifiable Agentic Tasks

TL;DR

LOGIGEN is introduced, a logic-driven framework that synthesizes verifiable training data based on three core pillars: Hard-Compiled Policy Grounding, Logic-Driven Forward Synthesis, and Deterministic State Verification, and a verification-based training protocol where Supervised Fine-Tuning on verifiable trajectories establishes compliance with hard-compiled policy.

Abstract

The evolution of Large Language Models (LLMs) from static instruction-followers to autonomous agents necessitates operating within complex, stateful environments to achieve precise state-transition objectives. However, this paradigm is bottlenecked by data scarcity, as existing tool-centric reverse-synthesis pipelines fail to capture the rigorous logic of real-world applications. We introduce \textbf{LOGIGEN}, a logic-driven framework that synthesizes verifiable training data based on three core pillars: \textbf{Hard-Compiled Policy Grounding}, \textbf{Logic-Driven Forward Synthesis}, and \textbf{Deterministic State Verification}. Specifically, a Triple-Agent Orchestration is employed: the \textbf{Architect} compiles natural-language policy into database constraints to enforce hard rules; the \textbf{Set Designer} initializes boundary-adjacent states to trigger critical policy conflicts; and the \textbf{Explorer} searches this environment to discover causal solution paths. This framework yields a dataset of 20,000 complex tasks across 8 domains, where validity is strictly guaranteed by checking exact state equivalence. Furthermore, we propose a verification-based training protocol where Supervised Fine-Tuning (SFT) on verifiable trajectories establishes compliance with hard-compiled policy, while Reinforcement Learning (RL) guided by dense state-rewards refines long-horizon goal achievement. On -Bench, LOGIGEN-32B(RL) achieves a \textbf{79.5\% success rate}, substantially outperforming the base model (40.7\%). These results demonstrate that logic-driven synthesis combined with verification-based training effectively constructs the causally valid trajectories needed for next-generation agents.
Paper Structure (67 sections, 11 equations, 10 figures, 5 tables)

This paper contains 67 sections, 11 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: (a) Performance comparisons. LOGIGEN-32B (RL) achieves a 79.5% success rate on $\tau^2$-Bench, significantly outperforming open-weight baselines and remaining competitive with proprietary models. It also surpasses general-purpose baselines with substantially larger parameters. (b) Performance breakdown. The gains stem from our verification-based training protocol: SFT on verifiable trajectories establishes compliance with hard-compiled policy, while RL guided by deterministic state-rewards refines long-horizon goal achievement.
  • Figure 2: LOGIGEN synthesizes verifiable agentic tasks via a Triple-Agent Orchestration: (1) the Architect expands seed domain knowledge into a Wiki Policy and compiles it into a Hard-Compiled Policy Environment; (2) the Set Designer seeds boundary-adjacent initial states ($N{-}1$) to maximize logical friction; and (3) the Explorer performs goal-conditioned exploration to discover executable multi-turn episodes, producing a spoiler-free task description and a deterministic target database snapshot for state-based verification.
  • Figure 3: Task complexity profile of LOGIGEN.(a) Distribution of the top-10 complexity tags assigned to generated tasks, showing that most samples involve conditional logic, multi-step interaction, and attribute-based selection, with a substantial portion requiring state-based reasoning and fallback (waterfall) behaviors. (b) Difficulty-level distribution, where L1 (Simple) tasks are rare (51), while the dataset is dominated by L2 (Intermediate; 6,161) and L3 (Advanced; 3,788) tasks, indicating a strong bias toward non-trivial, policy-constrained problem solving.
  • Figure 4: Training and Evaluation Protocol of LOGIGEN. The protocol operates on a Data Package containing the Wiki Policy ($\mathcal{P}$), Task Description ($\mathcal{I}$), and paired database snapshots. Within the Interaction Loop, a User Simulator (driven by $\mathcal{I}$) interacts with the Agent (governed by $\mathcal{P}$ and tools in $\mathcal{E}$). Each tool invocation induces a deterministic mutation of the database state ($s_\text{origin} \rightarrow s_\text{1} \rightarrow s_\text{2} \rightarrow \dots \rightarrow s_\text{final}$). The Verifier computes a binary success metric $R_{final}$ by performing a canonicalized State-Diff between the resulting state $s_\text{final}$ and the ground-truth target $s_\text{target}$.
  • Figure 5: Training reward curves for LOGIGEN-8B (left) and 32B (right) models. Compared to Vanilla GRPO, TA-GRPO exhibits higher sample efficiency and achieves superior asymptotic rewards, particularly on the 8B scale.
  • ...and 5 more figures