See the past: Time-Reversed Scene Reconstruction from Thermal Traces Using Visual Language Models

Kebin Contreras; Luis Toscano-Palomino; Mauro Dalla Mura; Jorge Bacca

See the past: Time-Reversed Scene Reconstruction from Thermal Traces Using Visual Language Models

Kebin Contreras, Luis Toscano-Palomino, Mauro Dalla Mura, Jorge Bacca

TL;DR

This paper tackles the challenge of reconstructing recent past scenes from current observations by leveraging fading thermal traces as passive temporal codes alongside RGB context. It introduces a framework that couples Visual Language Models with a constrained diffusion model, where a frozen VLM generates semantic scene descriptions conditioned on RGB and thermal inputs, guiding a pretrained diffusion backbone to synthesize plausible past frames without retraining. The approach is validated in controlled scenarios, showing that semantic priors and thermal cues improve both low-level fidelity and high-level semantics, with reconstructions credible up to about 1–2 minutes in the past. This work suggests a promising direction for time-reversed imaging with potential applications in forensics, scene analysis, and security, while outlining future work to handle real-world variability and multi-subject dynamics.

Abstract

Recovering the past from present observations is an intriguing challenge with potential applications in forensics and scene analysis. Thermal imaging, operating in the infrared range, provides access to otherwise invisible information. Since humans are typically warmer (37 C -98.6 F) than their surroundings, interactions such as sitting, touching, or leaning leave residual heat traces. These fading imprints serve as passive temporal codes, allowing for the inference of recent events that exceed the capabilities of RGB cameras. This work proposes a time-reversed reconstruction framework that uses paired RGB and thermal images to recover scene states from a few seconds earlier. The proposed approach couples Visual-Language Models (VLMs) with a constrained diffusion process, where one VLM generates scene descriptions and another guides image reconstruction, ensuring semantic and structural consistency. The method is evaluated in three controlled scenarios, demonstrating the feasibility of reconstructing plausible past frames up to 120 seconds earlier, providing a first step toward time-reversed imaging from thermal traces.

See the past: Time-Reversed Scene Reconstruction from Thermal Traces Using Visual Language Models

TL;DR

Abstract

See the past: Time-Reversed Scene Reconstruction from Thermal Traces Using Visual Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)