Analyzing the Role of Semantic Representations in the Era of Large Language Models
Zhijing Jin, Yuen Chen, Fernando Gonzalez, Jiarui Liu, Jiayi Zhang, Julian Michael, Bernhard Schölkopf, Mona Diab
TL;DR
This paper asks whether traditional semantic representations, exemplified by Abstract Meaning Representation (AMR), retain value in the era of fixed-weights large language models. It introduces AmrCoT, an AMR-driven prompt that prepends AMR to input text for zero-shot tasks, and evaluates it across five diverse NLP tasks with multiple GPT-family models. The results show AMR yields only modest, task-dependent changes and often hurts performance, though it helps a subset of samples, especially in semantically complex cases. Through case studies, large-scale feature analyses, and ablations (including gold vs parser AMR and step-by-step reasoning checks), the work reveals systematic weaknesses in AMR for MWEs and named entities, while confirming that raw text remains a more influential intermediate representation for current LLMs. The study highlights the need to improve how LLMs map symbolic representations like AMR to outputs, and suggests future directions including training LLMs specifically for AMR use and refining prompts to better exploit semantic structures.
Abstract
Traditionally, natural language processing (NLP) models often use a rich set of features created by linguistic expertise, such as semantic representations. However, in the era of large language models (LLMs), more and more tasks are turned into generic, end-to-end sequence generation problems. In this paper, we investigate the question: what is the role of semantic representations in the era of LLMs? Specifically, we investigate the effect of Abstract Meaning Representation (AMR) across five diverse NLP tasks. We propose an AMR-driven chain-of-thought prompting method, which we call AMRCoT, and find that it generally hurts performance more than it helps. To investigate what AMR may have to offer on these tasks, we conduct a series of analysis experiments. We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions, named entities, and in the final inference step where the LLM must connect its reasoning over the AMR to its prediction. We recommend focusing on these areas for future work in semantic representations for LLMs. Our code: https://github.com/causalNLP/amr_llm.
