Table of Contents
Fetching ...

RUMBoost: Gradient Boosted Random Utility Models

Nicolas Salvadé, Tim Hillel

TL;DR

The RUMBoost model is introduced, a novel discrete choice modelling approach that combines the interpretability and behavioural robustness of Random Utility Models (RUMs) with the generalisation and predictive ability of deep learning methods.

Abstract

This paper introduces the RUMBoost model, a novel discrete choice modelling approach that combines the interpretability and behavioural robustness of Random Utility Models (RUMs) with the generalisation and predictive ability of deep learning methods. We obtain the full functional form of non-linear utility specifications by replacing each linear parameter in the utility functions of a RUM with an ensemble of gradient boosted regression trees. This enables piece-wise constant utility values to be imputed for all alternatives directly from the data for any possible combination of input variables. We introduce additional constraints on the ensembles to ensure three crucial features of the utility specifications: (i) dependency of the utilities of each alternative on only the attributes of that alternative, (ii) monotonicity of marginal utilities, and (iii) an intrinsically interpretable functional form, where the exact response of the model is known throughout the entire input space. Furthermore, we introduce an optimisation-based smoothing technique that replaces the piece-wise constant utility values of alternative attributes with monotonic piece-wise cubic splines to identify non-linear parameters with defined gradient. We demonstrate the potential of the RUMBoost model compared to various ML and Random Utility benchmark models for revealed preference mode choice data from London. The results highlight the great predictive performance and the direct interpretability of our proposed approach. Furthermore, the smoothed attribute utility functions allow for the calculation of various behavioural indicators and marginal utilities. Finally, we demonstrate the flexibility of our methodology by showing how the RUMBoost model can be extended to complex model specifications, including attribute interactions, correlation within alternative error terms and heterogeneity within the population.

RUMBoost: Gradient Boosted Random Utility Models

TL;DR

The RUMBoost model is introduced, a novel discrete choice modelling approach that combines the interpretability and behavioural robustness of Random Utility Models (RUMs) with the generalisation and predictive ability of deep learning methods.

Abstract

This paper introduces the RUMBoost model, a novel discrete choice modelling approach that combines the interpretability and behavioural robustness of Random Utility Models (RUMs) with the generalisation and predictive ability of deep learning methods. We obtain the full functional form of non-linear utility specifications by replacing each linear parameter in the utility functions of a RUM with an ensemble of gradient boosted regression trees. This enables piece-wise constant utility values to be imputed for all alternatives directly from the data for any possible combination of input variables. We introduce additional constraints on the ensembles to ensure three crucial features of the utility specifications: (i) dependency of the utilities of each alternative on only the attributes of that alternative, (ii) monotonicity of marginal utilities, and (iii) an intrinsically interpretable functional form, where the exact response of the model is known throughout the entire input space. Furthermore, we introduce an optimisation-based smoothing technique that replaces the piece-wise constant utility values of alternative attributes with monotonic piece-wise cubic splines to identify non-linear parameters with defined gradient. We demonstrate the potential of the RUMBoost model compared to various ML and Random Utility benchmark models for revealed preference mode choice data from London. The results highlight the great predictive performance and the direct interpretability of our proposed approach. Furthermore, the smoothed attribute utility functions allow for the calculation of various behavioural indicators and marginal utilities. Finally, we demonstrate the flexibility of our methodology by showing how the RUMBoost model can be extended to complex model specifications, including attribute interactions, correlation within alternative error terms and heterogeneity within the population.
Paper Structure (23 sections, 24 equations, 8 figures, 9 tables, 2 algorithms)

This paper contains 23 sections, 24 equations, 8 figures, 9 tables, 2 algorithms.

Figures (8)

  • Figure 1: Utility contributions of a) travel time and b) cost on the LPMC dataset, both under a negative monotonic constraint. Each step represents a split point of a regression tree in the corresponding ensemble.
  • Figure 2: Utility contributions of a) age and b) departure time on the LPMC dataset. Both variables are non-monotonic. Each step represents a split point of a regression tree in the corresponding ensemble.
  • Figure 3: Utility contributions of the travel time on the LPMC dataset with bootstrapping for a) walking, b) cycling, c) PT and d) driving alternative. Each line with transparency corresponds to a bootstrap sampling iteration. The mean is highlighted, and the distribution of data is shown on top of each figure. The figures are cropped at 2 hours of travel time.
  • Figure 4: Piece-wise monotonic cubic spline interpolation of a) travel time and b) cost on the LPMC dataset. The knots are drawn in black and the first and last knots are omitted for clarity. The GBUV used for interpolation are plotted as a scatter plot.
  • Figure 5: Value of Time (VoT) for a) rail, b) driving. The VoT is capped at 100£/h, and displayed only where the utility functions derivatives are non zero.
  • ...and 3 more figures