One Size Fits None: Modeling NYC Taxi Trips
Tomas Eglinskas
TL;DR
It is shown that building one universal model is a mistake and, due to Simpson's paradox, a combined model looks accurate on average but fails to predict tips for individual taxi categories requiring specialized models.
Abstract
The rise of app-based ride-sharing has fundamentally changed tipping culture in New York City. We analyzed 280 million trips from 2024 to see if we could predict tips for traditional taxis versus high-volume for-hire services. By testing methods from linear regression to deep neural networks, we found two very different outcomes. Traditional taxis are highly predictable ($R^2 \approx 0.72$) due to the in-car payment screen. In contrast, app-based tipping is random and hard to model ($R^2 \approx 0.17$). In conclusion, we show that building one universal model is a mistake and, due to Simpson's paradox, a combined model looks accurate on average but fails to predict tips for individual taxi categories requiring specialized models.
