Table of Contents
Fetching ...

Time-Varying Propensity Score to Bridge the Gap between the Past and Present

Rasool Fakoor, Jonas Mueller, Zachary C. Lipton, Pratik Chaudhari, Alexander J. Smola

TL;DR

The paper tackles the challenge of continual distribution drift in real-world data by introducing a time-varying propensity score that reweights past data according to their alignment with the present distribution. The method models the drift via exponential-family deviations of $p_t(x)$ and learns a score $\omega(x,T,t)=\exp(g_\theta(x,T)-g_\theta(x,t))$ to bias training toward past samples that evolved similarly to current data. Through experiments in continuous supervised learning and reinforcement learning, the approach consistently outperforms baselines (Everything, Recent, Finetune) and standard propensity methods, demonstrating improved adaptability to gradual shifts without assuming specific generative forms. The proposed framework is versatile, data-agnostic, and capable of automatic drift detection, offering practical benefits for deploying robust models in non-stationary environments.

Abstract

Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.

Time-Varying Propensity Score to Bridge the Gap between the Past and Present

TL;DR

The paper tackles the challenge of continual distribution drift in real-world data by introducing a time-varying propensity score that reweights past data according to their alignment with the present distribution. The method models the drift via exponential-family deviations of and learns a score to bias training toward past samples that evolved similarly to current data. Through experiments in continuous supervised learning and reinforcement learning, the approach consistently outperforms baselines (Everything, Recent, Finetune) and standard propensity methods, demonstrating improved adaptability to gradual shifts without assuming specific generative forms. The proposed framework is versatile, data-agnostic, and capable of automatic drift detection, offering practical benefits for deploying robust models in non-stationary environments.

Abstract

Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.
Paper Structure (30 sections, 27 equations, 14 figures, 3 tables, 2 algorithms)

This paper contains 30 sections, 27 equations, 14 figures, 3 tables, 2 algorithms.

Figures (14)

  • Figure 1: Relative test accuracy achieved by our method and others vs Everything in continuous supervised learning benchmarks. The x-axes indicate results on test data at each time step $t$ as described in \ref{['sec:imgcontious']}, and points above the line () indicate better performance than Everything. We see that our method stands out as the only approach that consistently maintains its effectiveness regardless of the specific setting or the diversity of the past data. It is important to note that all models in these experiments are trained entirely from scratch per time step and each point on these plots represents a different model resulted in large number of experiments (e.g. \ref{['fig:img_benchmark']}d shows 500 experiments per method). To enhance the clarity, we only display the mean accuracy achieved by the Recent using a dashed line, as it performs the worst in the image benchmarks.
  • Figure 2: Comparison of the average undiscounted return (higher is better) of our method (red) against baseline algorithms on ROBEL D’Claw and Half-Cheetah WindVel environments. Our method consistently outperforms other methods in terms of sample complexity across all environments.
  • Figure 3: Comparing our method against standard propensity on continuous CIFAR-10 benchmark. The x-axis indicates results on test data at each time step. Fig \ref{['fig:sccmp_']}a shows classification accuracy and Fig \ref{['fig:sccmp_']}b shows propensity scores. These results clearly demonstrate that our time-varying propensity score appropriately weighs the data as it evolves, providing a plausible explanation for the better performance of our method compared to the standard propensity score. The standard propensity score is unable to identify the evolution of the data, which hinders its effectiveness in adapting to changing conditions.
  • Figure 4: Relative test accuracy achieved by our method and others vs Everything on continuous supervised learning benchmarks. The x-axes indicate accuracy on test data at each time step $t$ as described in \ref{['sec:imgcontious']}, and points above the line () indicate better accuracy than Everything. The setup of this experiment is exactly the same as that of \ref{['fig:img_benchmark']}; Meta-Weight-Net is added as another baseline. We see that our method is effective regardless of the specific setting or the diversity of the past data; in particular it is performs better than Meta-Weight-Net.
  • Figure 5: Comparing our method against standard propensity on Gaussian benchmark. Fig \ref{['fig:toy_shifts_vs_prop']}a shows a periodic shift in which samples are generated from a Gaussian distribution with means ($\mu_t$) that vary over time; the standard deviation is 1 for all times. Fig \ref{['fig:toy_shifts_vs_prop']}b shows the score assigned by standard propensity score and our method. We have calculated a propensity score for each data point based on how similar it is to the samples at time 99. Data points that have a similar shift to the data point at time 99 should get a high score. For example, the data points at times 84, 79, 74, and 69 should have the highest scores, as they are the most similar to the data point at time 99. This experiment clearly demonstrates that our time-varying propensity score appropriately weighs the data as it evolves, providing a plausible explanation for the superior performance of our method compared to that of standard propensity score which is unable to identify the distribution shift which hinders its effectiveness in changing conditions. See \ref{['sec:compare_prop-drift']} for more details.
  • ...and 9 more figures

Theorems & Definitions (5)

  • Remark 2: Evaluating the model learned from data from $(p_t)_{t\leq T}$ on test from $p_{T+\dd{t}}$
  • Remark 3: Is the time-varying propensity score equivalent to standard propensity score on $(x,t)$?
  • Remark 4: Modeling the propensity score using an exponential family does not mean that the data is from an exponential family
  • Remark 5: Properties of the different baselines
  • Remark 6: How does the objective change when we have supervised learning or reinforcement learning?