Table of Contents
Fetching ...

Building Decision Making Models Through Language Model Regime

Yu Zhang, Haoxiang Liu, Feijun Jiang, Weihua Luo, Kaifu Zhang

TL;DR

The paper addresses limited generalization in decision making by proposing Learning then Using (LTU), a two-stage LTU pipeline that first performs continued pre-training to build a foundation decision making model and then applies supervised fine-tuning for downstream tasks. By structuring data as (state, action, reward) and optimizing a causal language modeling objective, LTU demonstrates improved decision making and cross-task generalization in PPC advertising and SEO tasks, outperforming standard supervised fine-tuning. The study highlights the importance of a broad learning phase, the potential negative impact of injecting generic common knowledge, and outlines a path toward adaptable, multi-task decision making with LLMs. The findings suggest LTU can extend the reach of foundation models beyond games and robotics to real-world decision making in diverse domains.

Abstract

We propose a novel approach for decision making problems leveraging the generalization capabilities of large language models (LLMs). Traditional methods such as expert systems, planning algorithms, and reinforcement learning often exhibit limited generalization, typically requiring the training of new models for each unique task. In contrast, LLMs demonstrate remarkable success in generalizing across varied language tasks, inspiring a new strategy for training decision making models. Our approach, referred to as "Learning then Using" (LTU), entails a two-stage process. Initially, the \textit{learning} phase develops a robust foundational decision making model by integrating diverse knowledge from various domains and decision making contexts. The subsequent \textit{using} phase refines this foundation model for specific decision making scenarios. Distinct from other studies that employ LLMs for decision making through supervised learning, our LTU method embraces a versatile training methodology that combines broad pre-training with targeted fine-tuning. Experiments in e-commerce domains such as advertising and search optimization have shown that LTU approach outperforms traditional supervised learning regimes in decision making capabilities and generalization. The LTU approach is the first practical training architecture for both single-step and multi-step decision making tasks combined with LLMs, which can be applied beyond game and robot domains. It provides a robust and adaptable framework for decision making, enhances the effectiveness and flexibility of various systems in tackling various challenges.

Building Decision Making Models Through Language Model Regime

TL;DR

The paper addresses limited generalization in decision making by proposing Learning then Using (LTU), a two-stage LTU pipeline that first performs continued pre-training to build a foundation decision making model and then applies supervised fine-tuning for downstream tasks. By structuring data as (state, action, reward) and optimizing a causal language modeling objective, LTU demonstrates improved decision making and cross-task generalization in PPC advertising and SEO tasks, outperforming standard supervised fine-tuning. The study highlights the importance of a broad learning phase, the potential negative impact of injecting generic common knowledge, and outlines a path toward adaptable, multi-task decision making with LLMs. The findings suggest LTU can extend the reach of foundation models beyond games and robotics to real-world decision making in diverse domains.

Abstract

We propose a novel approach for decision making problems leveraging the generalization capabilities of large language models (LLMs). Traditional methods such as expert systems, planning algorithms, and reinforcement learning often exhibit limited generalization, typically requiring the training of new models for each unique task. In contrast, LLMs demonstrate remarkable success in generalizing across varied language tasks, inspiring a new strategy for training decision making models. Our approach, referred to as "Learning then Using" (LTU), entails a two-stage process. Initially, the \textit{learning} phase develops a robust foundational decision making model by integrating diverse knowledge from various domains and decision making contexts. The subsequent \textit{using} phase refines this foundation model for specific decision making scenarios. Distinct from other studies that employ LLMs for decision making through supervised learning, our LTU method embraces a versatile training methodology that combines broad pre-training with targeted fine-tuning. Experiments in e-commerce domains such as advertising and search optimization have shown that LTU approach outperforms traditional supervised learning regimes in decision making capabilities and generalization. The LTU approach is the first practical training architecture for both single-step and multi-step decision making tasks combined with LLMs, which can be applied beyond game and robot domains. It provides a robust and adaptable framework for decision making, enhances the effectiveness and flexibility of various systems in tackling various challenges.
Paper Structure (9 sections, 1 equation, 2 figures, 6 tables)

This paper contains 9 sections, 1 equation, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Different context part in transformer architecture
  • Figure 2: Example of PPC training data