Table of Contents
Fetching ...

One Size Fits None: Modeling NYC Taxi Trips

Tomas Eglinskas

TL;DR

It is shown that building one universal model is a mistake and, due to Simpson's paradox, a combined model looks accurate on average but fails to predict tips for individual taxi categories requiring specialized models.

Abstract

The rise of app-based ride-sharing has fundamentally changed tipping culture in New York City. We analyzed 280 million trips from 2024 to see if we could predict tips for traditional taxis versus high-volume for-hire services. By testing methods from linear regression to deep neural networks, we found two very different outcomes. Traditional taxis are highly predictable ($R^2 \approx 0.72$) due to the in-car payment screen. In contrast, app-based tipping is random and hard to model ($R^2 \approx 0.17$). In conclusion, we show that building one universal model is a mistake and, due to Simpson's paradox, a combined model looks accurate on average but fails to predict tips for individual taxi categories requiring specialized models.

One Size Fits None: Modeling NYC Taxi Trips

TL;DR

It is shown that building one universal model is a mistake and, due to Simpson's paradox, a combined model looks accurate on average but fails to predict tips for individual taxi categories requiring specialized models.

Abstract

The rise of app-based ride-sharing has fundamentally changed tipping culture in New York City. We analyzed 280 million trips from 2024 to see if we could predict tips for traditional taxis versus high-volume for-hire services. By testing methods from linear regression to deep neural networks, we found two very different outcomes. Traditional taxis are highly predictable () due to the in-car payment screen. In contrast, app-based tipping is random and hard to model (). In conclusion, we show that building one universal model is a mistake and, due to Simpson's paradox, a combined model looks accurate on average but fails to predict tips for individual taxi categories requiring specialized models.
Paper Structure (20 sections, 1 equation, 10 figures, 6 tables)

This paper contains 20 sections, 1 equation, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Total trips and with tips by category in 2024
  • Figure 2: Tip and Distance Distribution across taxi types
  • Figure 3: Heatmap of Tip Counts (Top) and Median Tip Amounts (Bottom)
  • Figure 4: Correlation Matrix without Synthetic Features
  • Figure 5: Correlation Matrix with Synthetic Features
  • ...and 5 more figures