Time-Varying Propensity Score to Bridge the Gap between the Past and Present
Rasool Fakoor, Jonas Mueller, Zachary C. Lipton, Pratik Chaudhari, Alexander J. Smola
TL;DR
The paper tackles the challenge of continual distribution drift in real-world data by introducing a time-varying propensity score that reweights past data according to their alignment with the present distribution. The method models the drift via exponential-family deviations of $p_t(x)$ and learns a score $\omega(x,T,t)=\exp(g_\theta(x,T)-g_\theta(x,t))$ to bias training toward past samples that evolved similarly to current data. Through experiments in continuous supervised learning and reinforcement learning, the approach consistently outperforms baselines (Everything, Recent, Finetune) and standard propensity methods, demonstrating improved adaptability to gradual shifts without assuming specific generative forms. The proposed framework is versatile, data-agnostic, and capable of automatic drift detection, offering practical benefits for deploying robust models in non-stationary environments.
Abstract
Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.
