MRT at SemEval-2025 Task 8: Maximizing Recovery from Tables with Multiple Steps
Maximiliano Hormazábal Lagos, Álvaro Bueno Saez, Héctor Cerezo-Costas, Pedro Alonso Doval, Jorge Alcalde Vesteiro
TL;DR
This work tackles QA over tabular data by introducing MRT, a multi-step pipeline that uses LLMs to generate natural-language instructions, which are then translated into Python/Pandas code to interact with the table and extract answers. The system decomposes the problem into modular steps—Column Descriptor, Explainer, Coder and Runner, Interpreter, and Formatter—with iterative error handling to improve robustness. On the Databench dataset, MRT achieves $70.50\%$ accuracy and analyzes error patterns to identify core bottlenecks in instruction-to-code translation, value filtering, and output formatting. The approach demonstrates that a structured, interpretable pipeline combining code generation with reasoning can effectively address complex, multi-column questions in tabular QA, while outlining concrete avenues for future improvements and scalability.
Abstract
In this paper we expose our approach to solve the \textit{SemEval 2025 Task 8: Question-Answering over Tabular Data} challenge. Our strategy leverages Python code generation with LLMs to interact with the table and get the answer to the questions. The process is composed of multiple steps: understanding the content of the table, generating natural language instructions in the form of steps to follow in order to get the answer, translating these instructions to code, running it and handling potential errors or exceptions. These steps use open source LLMs and fine grained optimized prompts for each task (step). With this approach, we achieved a score of $70.50\%$ for subtask 1.
