Table of Contents
Fetching ...

LapGSR: Laplacian Reconstructive Network for Guided Thermal Super-Resolution

Aditya Kasliwal, Ishaan Gakhar, Aryan Kamani, Pratinav Seth, Ujjwal Verma

TL;DR

LapGSR, a multimodal, lightweight, generative model incorporating Laplacian image pyramids for guided thermal super-resolution is proposed, a model with significantly fewer parameters than other SOTA models while demonstrating excellent results on two cross-domain datasets viz.

Abstract

In the last few years, the fusion of multi-modal data has been widely studied for various applications such as robotics, gesture recognition, and autonomous navigation. Indeed, high-quality visual sensors are expensive, and consumer-grade sensors produce low-resolution images. Researchers have developed methods to combine RGB color images with non-visual data, such as thermal, to overcome this limitation to improve resolution. Fusing multiple modalities to produce visually appealing, high-resolution images often requires dense models with millions of parameters and a heavy computational load, which is commonly attributed to the intricate architecture of the model. We propose LapGSR, a multimodal, lightweight, generative model incorporating Laplacian image pyramids for guided thermal super-resolution. This approach uses a Laplacian Pyramid on RGB color images to extract vital edge information, which is then used to bypass heavy feature map computation in the higher layers of the model in tandem with a combined pixel and adversarial loss. LapGSR preserves the spatial and structural details of the image while also being efficient and compact. This results in a model with significantly fewer parameters than other SOTA models while demonstrating excellent results on two cross-domain datasets viz. ULB17-VT and VGTSR datasets.

LapGSR: Laplacian Reconstructive Network for Guided Thermal Super-Resolution

TL;DR

LapGSR, a multimodal, lightweight, generative model incorporating Laplacian image pyramids for guided thermal super-resolution is proposed, a model with significantly fewer parameters than other SOTA models while demonstrating excellent results on two cross-domain datasets viz.

Abstract

In the last few years, the fusion of multi-modal data has been widely studied for various applications such as robotics, gesture recognition, and autonomous navigation. Indeed, high-quality visual sensors are expensive, and consumer-grade sensors produce low-resolution images. Researchers have developed methods to combine RGB color images with non-visual data, such as thermal, to overcome this limitation to improve resolution. Fusing multiple modalities to produce visually appealing, high-resolution images often requires dense models with millions of parameters and a heavy computational load, which is commonly attributed to the intricate architecture of the model. We propose LapGSR, a multimodal, lightweight, generative model incorporating Laplacian image pyramids for guided thermal super-resolution. This approach uses a Laplacian Pyramid on RGB color images to extract vital edge information, which is then used to bypass heavy feature map computation in the higher layers of the model in tandem with a combined pixel and adversarial loss. LapGSR preserves the spatial and structural details of the image while also being efficient and compact. This results in a model with significantly fewer parameters than other SOTA models while demonstrating excellent results on two cross-domain datasets viz. ULB17-VT and VGTSR datasets.

Paper Structure

This paper contains 19 sections, 3 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Laplacian Pyramid Visualization. The first row is the Laplacian pyramid of a grayscaled RGB image with two levels and the residual at the end. The second row contains the Laplacian pyramid of a thermal image with two levels and a residual at the end. These images have been taken from the ULB17-VT dataset almasri2019multimodal.
  • Figure 2: Proposed LapGSR model architecture. $I_{RGB}$ refers to the grayscaled high-resolution RGB image from the ULB17-VT dataset. The Green highlight represents the LTB, the Blue represents the MTB, and the Pink represents the HTB. The given figure is for Pyramid of Depth 2; L1, L2, and L3 represent the layers of the modified Laplacian pyramid, where the lower-resolution thermal image replaces the residual. The affine transformation is only present in the schematic representation for better visualization and has not been applied to the input or output of the model.
  • Figure 3: visualization of our model's output on 2 instances of the VGTSR dataset. The first image in each row is the high-resolution RGB image, the second is the ground truth, the third is a patch of the Ground Truth, the fourth is a patch of LapGSR's output, and the last is a patch of the low-resolution thermal image.
  • Figure 4: This is an instance from the VGTSR dataset. The highlighted region of interest (ROI) in the RGB image has sharp edges for the car and bus. This is also present in the ROI of the subsequent Laplacian pyramid, which helps our edge-guided model learn accurate representations of the thermal image, as evident in the predicted ROI. L1 and L2 stand for the first and second layers of the Laplacian Pyramid.
  • Figure 5: This is a patch from an instance of the VGTSR dataset. The highlighted region of interest (ROI) in the RGB image has blurry driveway edges. This lack of textural information is also present in the ROI of the subsequent Laplacian pyramid, which results in sub-par results of our model, evident in the Predicted ROI. L1 and L2 stand for the first and second layers of the Laplacian Pyramid.
  • ...and 2 more figures