Table of Contents
Fetching ...

On the Effects of Fine-tuning Language Models for Text-Based Reinforcement Learning

Mauricio Gruppi, Soham Dan, Keerthiram Murugesan, Subhajit Chaudhury

TL;DR

This work examines how fine-tuning language models (LMs) affects learning in text-based reinforcement learning (TBRL). By comparing fixed pretrained LMs against fine-tuned variants across TextWorld Commonsense and Jericho, it shows that preserving semantic information accelerates training and enhances robustness, while fine-tuning induces semantic degeneration that impairs learning and transfer to paraphrased or vocabulary-altered tasks. The study reveals word-embedding drift during fine-tuning and demonstrates that fixed LMs maintain performance under language perturbations, whereas fine-tuned models falter. These findings highlight the value of maintaining pre-trained semantic structures in TBRL and point toward methods that balance task-specific adaptation with semantic retention.

Abstract

Text-based reinforcement learning involves an agent interacting with a fictional environment using observed text and admissible actions in natural language to complete a task. Previous works have shown that agents can succeed in text-based interactive environments even in the complete absence of semantic understanding or other linguistic capabilities. The success of these agents in playing such games suggests that semantic understanding may not be important for the task. This raises an important question about the benefits of LMs in guiding the agents through the game states. In this work, we show that rich semantic understanding leads to efficient training of text-based RL agents. Moreover, we describe the occurrence of semantic degeneration as a consequence of inappropriate fine-tuning of language models in text-based reinforcement learning (TBRL). Specifically, we describe the shift in the semantic representation of words in the LM, as well as how it affects the performance of the agent in tasks that are semantically similar to the training games. We believe these results may help develop better strategies to fine-tune agents in text-based RL scenarios.

On the Effects of Fine-tuning Language Models for Text-Based Reinforcement Learning

TL;DR

This work examines how fine-tuning language models (LMs) affects learning in text-based reinforcement learning (TBRL). By comparing fixed pretrained LMs against fine-tuned variants across TextWorld Commonsense and Jericho, it shows that preserving semantic information accelerates training and enhances robustness, while fine-tuning induces semantic degeneration that impairs learning and transfer to paraphrased or vocabulary-altered tasks. The study reveals word-embedding drift during fine-tuning and demonstrates that fixed LMs maintain performance under language perturbations, whereas fine-tuned models falter. These findings highlight the value of maintaining pre-trained semantic structures in TBRL and point toward methods that balance task-specific adaptation with semantic retention.

Abstract

Text-based reinforcement learning involves an agent interacting with a fictional environment using observed text and admissible actions in natural language to complete a task. Previous works have shown that agents can succeed in text-based interactive environments even in the complete absence of semantic understanding or other linguistic capabilities. The success of these agents in playing such games suggests that semantic understanding may not be important for the task. This raises an important question about the benefits of LMs in guiding the agents through the game states. In this work, we show that rich semantic understanding leads to efficient training of text-based RL agents. Moreover, we describe the occurrence of semantic degeneration as a consequence of inappropriate fine-tuning of language models in text-based reinforcement learning (TBRL). Specifically, we describe the shift in the semantic representation of words in the LM, as well as how it affects the performance of the agent in tasks that are semantically similar to the training games. We believe these results may help develop better strategies to fine-tune agents in text-based RL scenarios.
Paper Structure (18 sections, 9 figures, 4 tables)

This paper contains 18 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Semantic degeneration of the terms kitchen and bloody axe in Zork 1.
  • Figure 2: Training performance comparing LM-based encoding models and hash/word embedding-based models. (left) shows the normalized scores for TWC games and (right) shows the game score achieved in training across 100k steps in Zork 1. Shaded area corresponds to one standard deviation.
  • Figure 3: Training curves of fixed/fine-tuned LMs on (left) TWC medium difficulty games and (right) Zork 1. Due to semantic degeneration, the fine-tuned models do not exhibit an increasing score converging to a maximum value. Shaded areas denote one standard deviation.
  • Figure 4: Shift caused by the semantic degeneration to the contextual word vectors in the RoBERTa model fine-tuned to Zork 1: (a) shows the word embeddings from the pre-trained model, (b) shows the word embeddings after fine-tuning to Zork 1. The bold words denote the case where the term "bloody axe" shifts towards the word "kitchen" as a result of them co-occurring in a positively rewarded state.
  • Figure 5: Evaluation of a RoBERTa agent on original (none), paraphrased, and lexical substitution observations on (left) TWC medium games and (right) Zork 1. In both scenarios, fixed LMs exhibit strong robustness to the perturbations, scoring as much as in the games without perturbations.
  • ...and 4 more figures