Table of Contents
Fetching ...

Delphos: A reinforcement learning framework for assisting discrete choice model specification

Gabriel Nova, Stephane Hess, Sander van Cranenburgh

Abstract

We introduce Delphos, a deep reinforcement learning framework for assisting the discrete choice model specification process. Delphos aims to support the modeller by providing automated, data-driven suggestions for utility specifications, thereby reducing the effort required to develop and refine utility functions. Delphos conceptualises model specification as a sequential decision-making problem, inspired by the way human choice modellers iteratively construct models through a series of reasoned specification decisions. In this setting, an agent learns to specify high-performing candidate models by choosing a sequence of modelling actions, such as selecting variables, accommodating both generic and alternative-specific taste parameters, applying non-linear transformations, and including interactions with covariates, while interacting with a modelling environment that estimates each candidate and returns a reward signal. Specifically, Delphos uses a Deep Q-Network that receives delayed rewards based on modelling outcomes (e.g., log-likelihood) and behavioural expectations (e.g., parameter signs), and distributes this signal across the sequence of actions to learn which modelling decisions lead to well-performing candidates. We evaluate Delphos on both simulated and empirical datasets using multiple reward settings. In simulated cases, learning curves, Q-value patterns, and performance metrics show that the agent learns to adaptively explore strategies to propose well-performing models across search spaces, while covering only a small fraction of the feasible modelling space. We further apply the framework to two empirical datasets to demonstrate its practical use. These experiments illustrate the ability of Delphos to generate competitive, behaviourally plausible models and highlight the potential of this adaptive, learning-based framework to assist the model specification process.

Delphos: A reinforcement learning framework for assisting discrete choice model specification

Abstract

We introduce Delphos, a deep reinforcement learning framework for assisting the discrete choice model specification process. Delphos aims to support the modeller by providing automated, data-driven suggestions for utility specifications, thereby reducing the effort required to develop and refine utility functions. Delphos conceptualises model specification as a sequential decision-making problem, inspired by the way human choice modellers iteratively construct models through a series of reasoned specification decisions. In this setting, an agent learns to specify high-performing candidate models by choosing a sequence of modelling actions, such as selecting variables, accommodating both generic and alternative-specific taste parameters, applying non-linear transformations, and including interactions with covariates, while interacting with a modelling environment that estimates each candidate and returns a reward signal. Specifically, Delphos uses a Deep Q-Network that receives delayed rewards based on modelling outcomes (e.g., log-likelihood) and behavioural expectations (e.g., parameter signs), and distributes this signal across the sequence of actions to learn which modelling decisions lead to well-performing candidates. We evaluate Delphos on both simulated and empirical datasets using multiple reward settings. In simulated cases, learning curves, Q-value patterns, and performance metrics show that the agent learns to adaptively explore strategies to propose well-performing models across search spaces, while covering only a small fraction of the feasible modelling space. We further apply the framework to two empirical datasets to demonstrate its practical use. These experiments illustrate the ability of Delphos to generate competitive, behaviourally plausible models and highlight the potential of this adaptive, learning-based framework to assist the model specification process.

Paper Structure

This paper contains 29 sections, 15 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: Classical RL framework. Adaptation from sutton2018reinforcement
  • Figure 2: The Deep Q-learning algorithm from nair2015massively. It includes three components: (1) the Q-network $Q(s, a; \theta)$, which estimates action values based on the current policy; (2) the target Q-network $Q(s, a; \theta^-)$, a periodically updated copy used to compute stable target values; and (3) the replay memory, which stores past transitions and enables training on random mini-batches to reduce temporal correlations and stabilise learning .
  • Figure 3: Framework for assisting discrete choice model specification. While Delphos encodes the current specification, selects a modelling action, and submits the model for estimation, the environment returns modelling outcomes as a reward signal. Transitions are stored in a replay buffer and used to update the policy and target networks.
  • Figure 4: Agent’s sequential decision-making process. Nodes represent internal states (current specifications). Dashed arrows indicate feasible actions, while solid arrows show the actions selected by the agent until termination.
  • Figure 5: Learning process on simulated dataset $S_1$. The upper panel displays the learning curve, showing the rolling mean and min–max range of the reward signal. The lower panel reports rolling novelty, measured as the share of previously unseen specifications.
  • ...and 7 more figures