Table of Contents
Fetching ...

Adaptive Problem Generation via Symbolic Representations

Teresa Yeo, Myeongho Jeon, Dulaj Weerakoon, Rui Qiao, Alok Prakash, Armando Solar-Lezama, Archan Misra

TL;DR

A closed-loop framework that learns modification strategies through prompt optimization in symbolic space is introduced, demonstrating that both adaptive problem generation and symbolic representation modifications contribute to improving the model's math solving ability.

Abstract

We present a method for generating training data for reinforcement learning with verifiable rewards to improve small open-weights language models on mathematical tasks. Existing data generation approaches rely on open-loop pipelines and fixed modifications that do not adapt to the model's capabilities. Furthermore, they typically operate directly on word problems, limiting control over problem structure. To address this, we perform modifications in a symbolic problem space, representing each problem as a set of symbolic variables and constraints (e.g., via algebraic frameworks such as SymPy or SMT formulations). This representation enables precise control over problem structure, automatic generation of ground-truth solutions, and decouples mathematical reasoning from linguistic realization. We also show that this results in more diverse generations. To adapt the problem difficulty to the model, we introduce a closed-loop framework that learns modification strategies through prompt optimization in symbolic space. Experimental results demonstrate that both adaptive problem generation and symbolic representation modifications contribute to improving the model's math solving ability.

Adaptive Problem Generation via Symbolic Representations

TL;DR

A closed-loop framework that learns modification strategies through prompt optimization in symbolic space is introduced, demonstrating that both adaptive problem generation and symbolic representation modifications contribute to improving the model's math solving ability.

Abstract

We present a method for generating training data for reinforcement learning with verifiable rewards to improve small open-weights language models on mathematical tasks. Existing data generation approaches rely on open-loop pipelines and fixed modifications that do not adapt to the model's capabilities. Furthermore, they typically operate directly on word problems, limiting control over problem structure. To address this, we perform modifications in a symbolic problem space, representing each problem as a set of symbolic variables and constraints (e.g., via algebraic frameworks such as SymPy or SMT formulations). This representation enables precise control over problem structure, automatic generation of ground-truth solutions, and decouples mathematical reasoning from linguistic realization. We also show that this results in more diverse generations. To adapt the problem difficulty to the model, we introduce a closed-loop framework that learns modification strategies through prompt optimization in symbolic space. Experimental results demonstrate that both adaptive problem generation and symbolic representation modifications contribute to improving the model's math solving ability.
Paper Structure (37 sections, 9 equations, 5 figures, 5 tables)

This paper contains 37 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Left:Symbolic Data Generation with Prompt Optimization. We generate training data for mathematical tasks by modifying problems in symbolic space rather than directly in natural language. Starting from an original word problem (top left), we convert it to a symbolic representation (top right). An optimized prompt (middle, italics)—learned through closed-loop feedback from the model's performance—specifies how to modify the symbolic structure to generate appropriately challenging problems. We call these prompts Opt-Sym. The modified symbolic representation (bottom right) is then translated back to a new word problem (bottom left). The LLM responses guide prompt optimization (green loop) to produce problems that are challenging. This symbolic approach enables precise control over problem structure and automatic ground-truth generation. In addition to introducing coupled equations, nested substitutions, note how the new word problem also returns a different story line than the original, i.e., generations through symbolic representations returns in more diverse training data. See \ref{['sec:add-analysis']} for a more detailed analysis. Right:Training on data generated via Opt-Sym improves model performance. We show the performance from training on data from Opt-Sym (green) compared to training on the original seed data (grey). The y-axis shows the performance averaged over several benchmarks, and the x-axis shows the amount of seed data used. Opt-Sym achieves an 8% improvement in performance even with as little as 100 seed data points. This is compared to the baseline's 4% improvement. This demonstrates both the effectiveness and data efficiency of our approach. See \ref{['tab:results_main', 'tab:results_average_3b']} and \ref{['fig:results-ablations']} for the full results and analysis.
  • Figure 2: Top:Data Generation Pipeline. We generate new math problems from seed examples using two approaches. Left (Natural Language (NL) representation): Given a seed problem $q \in \mathcal{D}_{\text{seed}}$, we prompt LLM $M_{\text{gen}}$ to create a modified problem $q'$, then prompt it again to generate its solution. Right (Symbolic Representation): Given a seed problem $s \in \mathcal{D}_{\text{sym}}$, prompt $M_{\text{gen}}$ to modify this representation to create $s'$, translate $s'$ back to natural language to obtain $q'$, and use a solver to compute the answer. Both pipelines produce problem-answer pairs for training. Bottom:Examples of optimized prompts and generated problems. Each column shows a different optimized prompt $p_{\text{opt}}$ (top, truncated) applied to the same seed problem shown at the top. The resulting generated problem is show below. The blue text highlights the modifications requested by each prompt. These prompts are learned through closed-loop optimization (see \ref{['sec:closed-loop']}) that takes student model performance as feedback, enabling them to target specific weaknesses. See \ref{['sec:app-full-opt-prompts']} for complete optimized prompts across different settings. The optimized prompts for both natural language and symbolic representations suggest similar mathematical modifications (fractions, multi-step reasoning, additional variables), indicating some overlap in how problem difficulty should be increased for the student model. However, the actual generated problems exhibit distinct characteristics. The NL modifications produce problems with complex narratives that require careful interpretation, while symbolic modifications yield problems with explicit mathematical operations stated directly in the text. Thus, the space in which modifications are applied impacts the final problem structure and complexity.
  • Figure 3: Additional analysis.a.Data generation methods across subset sizes: The 1.5B model trained with Opt-Sym (green) outperforms the baseline approaches when trained on different data subsets. b.Data filtering impact: Filtering the generated training data improves performance on larger dataset sizes. This can be because for small datasets, filtering has a larger impact on data diversity, which can harm performance setlur2024rl. c.Prompt scaling: Performance increases monotonically with the number of prompts used during data generation. This indicates that having more diverse data helps improve performance (see \ref{['fig:results-data-analysis']} for further analysis on data diversity). As we optimize for 4 prompts in total, there are $4\choose n$ options for selecting each of the $n$ prompts etc. This shows the averaged accuracy (%) and standard deviation over the different combinations. d.Representation sensitivity: The choice of mathematical problem representation significantly affects model performance.
  • Figure 4: Analysis of generated data. Left: Generation via symbolic representations produce more diverse data. For each seed problem, we generate 10 variants using baseline and optimized prompts ($\mathcal{P}^{\text{NL}}$ and $\mathcal{P}^{\text{Sym}}$), compute their embeddings, and measure average pairwise cosine distance. Optimized prompts with symbolic representations (Sym) achieve substantially higher diversity than natural language (NL) variants. Right: Student models (1.5B and 3B) achieve lower accuracy on generated data compared to the seed data (dash-dot line), with optimized prompts producing the most challenging problems. Accuracy is similar across word-based and symbolic generation methods. The difficulty gaps from seed data suggests successful problem variations.
  • Figure 5: Performance vs different subsets of data for different RL methods and student models.