Table of Contents
Fetching ...

MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents

Arya Bulusu, Brandon Man, Ashish Jagmohan, Aditya Vempaty, Jennifer Mari-Wyka, Deepak Akkil

TL;DR

The paper addresses the challenge of applying LLM-driven agents to domain-specific settings, focusing on math pedagogy. It introduces MathViz-E, a voice-driven system that orchestrates an LLM, a solver, and a Desmos-based graphing calculator to produce accurate visualizations from natural language. To support robust evaluation and benchmarking, the authors create three Common Core–aligned datasets and an auto-evaluation pipeline using SymPy, with open-sourced code and data. Across experiments, the LLM+Solver configuration significantly outperforms an LLM-only baseline, especially on multi-step and geometry-related tasks, underscoring the value of domain-specific tool integration for automated pedagogy and highlighting avenues for broader generalization.

Abstract

There has been significant recent interest in harnessing LLMs to control software systems through multi-step reasoning, planning and tool-usage. While some promising results have been obtained, application to specific domains raises several general issues including the control of specialized domain tools, the lack of existing datasets for training and evaluation, and the non-triviality of automated system evaluation and improvement. In this paper, we present a case-study where we examine these issues in the context of a specific domain. Specifically, we present an automated math visualizer and solver system for mathematical pedagogy. The system orchestrates mathematical solvers and math graphing tools to produce accurate visualizations from simple natural language commands. We describe the creation of specialized data-sets, and also develop an auto-evaluator to easily evaluate the outputs of our system by comparing them to ground-truth expressions. We have open sourced the data-sets and code for the proposed system.

MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents

TL;DR

The paper addresses the challenge of applying LLM-driven agents to domain-specific settings, focusing on math pedagogy. It introduces MathViz-E, a voice-driven system that orchestrates an LLM, a solver, and a Desmos-based graphing calculator to produce accurate visualizations from natural language. To support robust evaluation and benchmarking, the authors create three Common Core–aligned datasets and an auto-evaluation pipeline using SymPy, with open-sourced code and data. Across experiments, the LLM+Solver configuration significantly outperforms an LLM-only baseline, especially on multi-step and geometry-related tasks, underscoring the value of domain-specific tool integration for automated pedagogy and highlighting avenues for broader generalization.

Abstract

There has been significant recent interest in harnessing LLMs to control software systems through multi-step reasoning, planning and tool-usage. While some promising results have been obtained, application to specific domains raises several general issues including the control of specialized domain tools, the lack of existing datasets for training and evaluation, and the non-triviality of automated system evaluation and improvement. In this paper, we present a case-study where we examine these issues in the context of a specific domain. Specifically, we present an automated math visualizer and solver system for mathematical pedagogy. The system orchestrates mathematical solvers and math graphing tools to produce accurate visualizations from simple natural language commands. We describe the creation of specialized data-sets, and also develop an auto-evaluator to easily evaluate the outputs of our system by comparing them to ground-truth expressions. We have open sourced the data-sets and code for the proposed system.
Paper Structure (9 sections, 2 figures, 5 tables)

This paper contains 9 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Overview of the MathViz-E automated graphing system
  • Figure 2: UI of MathViz-E demonstrated through a multi-turn inverse problem