Investigating Low Data, Confidence Aware Image Prediction on Smooth Repetitive Videos using Gaussian Processes

Nikhil U. Shinde; Xiao Liang; Florian Richter; Michael C. Yip

Investigating Low Data, Confidence Aware Image Prediction on Smooth Repetitive Videos using Gaussian Processes

Nikhil U. Shinde, Xiao Liang, Florian Richter, Michael C. Yip

TL;DR

The paper tackles predicting future video frames from limited visual data while providing interpretable confidence bounds. It introduces a patch-based Gaussian Process regression framework that propagates uncertainty through time via moment matching, enabling online updates with new frames. Experiments on 2D Navier–Stokes flows and real-world footage (pedestrians and weather satellite imagery) demonstrate that the approach yields accurate mean predictions while assigning higher uncertainty to regions with complex or unseen dynamics. This low-data, uncertainty-aware method offers a complementary alternative to large neural models, with potential applications in robotics and safety-critical decision-making where data are scarce or costly to obtain.

Abstract

The ability to predict future states is crucial to informed decision-making while interacting with dynamic environments. With cameras providing a prevalent and information-rich sensing modality, the problem of predicting future states from image sequences has garnered a lot of attention. Current state-of-the-art methods typically train large parametric models for their predictions. Though often able to predict with accuracy these models often fail to provide interpretable confidence metrics around their predictions. Additionally these methods are reliant on the availability of large training datasets to converge to useful solutions. In this paper, we focus on the problem of predicting future images of an image sequence with interpretable confidence bounds from very little training data. To approach this problem, we use non-parametric models to take a probabilistic approach to image prediction. We generate probability distributions over sequentially predicted images, and propagate uncertainty through time to generate a confidence metric for our predictions. Gaussian Processes are used for their data efficiency and ability to readily incorporate new training data online. Our methods predictions are evaluated on a smooth fluid simulation environment. We showcase the capabilities of our approach on real world data by predicting pedestrian flows and weather patterns from satellite imagery.

Investigating Low Data, Confidence Aware Image Prediction on Smooth Repetitive Videos using Gaussian Processes

TL;DR

Abstract

Paper Structure (24 sections, 34 equations, 10 figures)

This paper contains 24 sections, 34 equations, 10 figures.

Introduction
Related Works
Background: Gaussian Processes
Methods
Problem Statement and Prediction Framework
Training
Prediction
Experiments and Results
Conclusion and Future Discussion
Appendix
Predictions: The first $3$ Predictions in a sequence of predictions
Prediction with all random inputs: Derivations
Mean Prediction Derivation:
Variance Prediction Derivation:
Prediction with Hybrid and Fully Known inputs: Derivations and Additional Details
...and 9 more sections

Figures (10)

Figure 1: Two real-world pedestrian flow prediction results over time generated by our method trained with only 10 images. Our method predicts future pedestrian motion. Larger variance appears at regions of moving pedestrians, roughly agreeing with regions of error.
Figure 2: Overview of the proposed method. We begin by pre-processing the initial video frames $[z_{0}, \dots, z_{t_0 - 1}]$ to form the dataset to train a GP regression model. During test time, three sequential input images are processed into test inputs. Our trained model predicts output distributions from these test inputs. These distributions are then combined to form a predictive distribution of the image at the next time step. The prediction is then incorporated into the next set of input images to recursively rollout a sequence of image probabilities.
Figure 3: Forward Prediction Experiment: Our model, trained using frames $[z_{0}, \dots, z_{9}]$, is used to predict frames $[z_{10}, \dots, z_{24}]$ of a 2D Navier Stokes simulation. (a) Ground truth, predicted mean, l1-error, and variance images. (b-c) Graphs of the prediction's relative error (RE) and mean standard deviations off (StdE). This shows our model's ability to accurately predict dynamic scenes. Error and variance increases overtime. Before $t=19$, the variance effectively informs the erroneous region. At later timesteps the variation in the variance becomes less informative. The base variance increases with the model becoming more uncertain overall. The accuracy of the model’s confidence in its own predictions naturally oscillates as seen in the mean std off plot in Figure. \ref{['fig:simple_prediction_experiment']}c. The decrease in this metric is caused when the model’s predicted variance grows faster than the true-error, while an increase correlates to the inverse. The increase in the predicted variance eventually dominates true error causing mean stds off to decrease, which is preferable as it ensures we provide a conservative estimation where the predicted bounds capture the ground truth.
Figure 4: Predictive Comparison Experiment: This figure compares the predictions on 2D Navier Stokes simulations between our method, a non-parametric KNN-based method, and the neural network-based methods, FNO-2D-time and FNO-3D trained on similarly low data. (a) Snapshots from predictions on a single test sequence. (b) Relative error vs the predictive time step averaged across 100 prediction tests.
Figure 5: Real-world hurricane satellite image results generated by our method trained with only 10 images. Training and prediction are performed on gray-scale satellite image patches. A snapshot of the satellite video and selected patches are shown on the left, whereas prediction results are shown in the middle and right. Our method can predict translation dynamics and relatively complex dynamics such as a region expanding. Note that it cannot predict trends that are not present in training, for example, emergence of new regions.
...and 5 more figures

Investigating Low Data, Confidence Aware Image Prediction on Smooth Repetitive Videos using Gaussian Processes

TL;DR

Abstract

Investigating Low Data, Confidence Aware Image Prediction on Smooth Repetitive Videos using Gaussian Processes

Authors

TL;DR

Abstract

Table of Contents

Figures (10)