Table of Contents
Fetching ...

Towards Locally Deployable Fine-Tuned Causal Large Language Models for Mode Choice Behaviour

Tareq Alsaleh, Bilal Farooq

TL;DR

This paper addresses the need for privacy-preserving, locally deployable tools to model travel mode choice with interpretability. It benchmarks eleven open-access causal LLMs and introduces LiTransMC, a 12B parameter model fine-tuned via parameter-efficient learning (QLoRA) and loss masking, achieving strong per-trip accuracy and near-perfect distributional calibration. A targeted similarity-based few-shot prompting strategy and a structured reasoning analysis (ESI and BERTopic) reveal how model explanations align with behavioural theory, while showing that RP data provide a cleaner learning signal than SP data. The study demonstrates the feasibility and advantages of specialized, locally hosted LLMs for transport policy analysis and agent-based simulations, offering practical deployment guidance and a path toward scalable, auditable AI in transportation research.

Abstract

This study investigates the adoption of open-access, locally deployable causal large language models (LLMs) for travel mode choice prediction and introduces LiTransMC, the first fine-tuned causal LLM developed for this task. We systematically benchmark eleven open-access LLMs (1-12B parameters) across three stated and revealed preference datasets, testing 396 configurations and generating over 79,000 mode choice decisions. Beyond predictive accuracy, we evaluate models generated reasoning using BERTopic for topic modelling and a novel Explanation Strength Index, providing the first structured analysis of how LLMs articulate decision factors in alignment with behavioural theory. LiTransMC, fine-tuned using parameter efficient and loss masking strategy, achieved a weighted F1 score of 0.6845 and a Jensen-Shannon Divergence of 0.000245, surpassing both untuned local models and larger proprietary systems, including GPT-4o with advanced persona inference and embedding-based loading, while also outperforming classical mode choice methods such as discrete choice models and machine learning classifiers for the same dataset. This dual improvement, i.e., high instant-level accuracy and near-perfect distributional calibration, demonstrates the feasibility of creating specialist, locally deployable LLMs that integrate prediction and interpretability. Through combining structured behavioural prediction with natural language reasoning, this work unlocks the potential for conversational, multi-task transport models capable of supporting agent-based simulations, policy testing, and behavioural insight generation. These findings establish a pathway for transforming general purpose LLMs into specialized and explainable tools for transportation research and policy formulation, while maintaining privacy, reducing cost, and broadening access through local deployment.

Towards Locally Deployable Fine-Tuned Causal Large Language Models for Mode Choice Behaviour

TL;DR

This paper addresses the need for privacy-preserving, locally deployable tools to model travel mode choice with interpretability. It benchmarks eleven open-access causal LLMs and introduces LiTransMC, a 12B parameter model fine-tuned via parameter-efficient learning (QLoRA) and loss masking, achieving strong per-trip accuracy and near-perfect distributional calibration. A targeted similarity-based few-shot prompting strategy and a structured reasoning analysis (ESI and BERTopic) reveal how model explanations align with behavioural theory, while showing that RP data provide a cleaner learning signal than SP data. The study demonstrates the feasibility and advantages of specialized, locally hosted LLMs for transport policy analysis and agent-based simulations, offering practical deployment guidance and a path toward scalable, auditable AI in transportation research.

Abstract

This study investigates the adoption of open-access, locally deployable causal large language models (LLMs) for travel mode choice prediction and introduces LiTransMC, the first fine-tuned causal LLM developed for this task. We systematically benchmark eleven open-access LLMs (1-12B parameters) across three stated and revealed preference datasets, testing 396 configurations and generating over 79,000 mode choice decisions. Beyond predictive accuracy, we evaluate models generated reasoning using BERTopic for topic modelling and a novel Explanation Strength Index, providing the first structured analysis of how LLMs articulate decision factors in alignment with behavioural theory. LiTransMC, fine-tuned using parameter efficient and loss masking strategy, achieved a weighted F1 score of 0.6845 and a Jensen-Shannon Divergence of 0.000245, surpassing both untuned local models and larger proprietary systems, including GPT-4o with advanced persona inference and embedding-based loading, while also outperforming classical mode choice methods such as discrete choice models and machine learning classifiers for the same dataset. This dual improvement, i.e., high instant-level accuracy and near-perfect distributional calibration, demonstrates the feasibility of creating specialist, locally deployable LLMs that integrate prediction and interpretability. Through combining structured behavioural prediction with natural language reasoning, this work unlocks the potential for conversational, multi-task transport models capable of supporting agent-based simulations, policy testing, and behavioural insight generation. These findings establish a pathway for transforming general purpose LLMs into specialized and explainable tools for transportation research and policy formulation, while maintaining privacy, reducing cost, and broadening access through local deployment.

Paper Structure

This paper contains 30 sections, 30 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Overall Experiment Design
  • Figure 2: Localized LLM and Server Setup
  • Figure 3: Instant Level LLMs Mode Choice Predictive Performance Evaluation
  • Figure 4: Decomposition of Model Performance
  • Figure 5: Model's Predictive Performance Ranking
  • ...and 10 more figures