Table of Contents
Fetching ...

Keypoint Detection and Description for Raw Bayer Images

Jiakai Lin, Jinchang Zhang, Guoyu Lu

TL;DR

This paper addresses the bottleneck of ISP-dependent processing for keypoint detection and local feature description by introducing a raw Bayer image–driven approach. It proposes two specialized Bayer convolution kernels and a two-branch encoder to produce a 256-d pixel-wise descriptor and per-pixel scores without demosaicing, enabling robust matching directly on raw data. Across HPatches-based experiments, the method achieves superior repeatability and higher homography-estimation accuracy for raw Bayer inputs, particularly under rotations and scale changes, outperforming RGB-based state-of-the-art methods on raw data. The work highlights the practical impact of ISP-free, resource-efficient feature extraction for robotics, and lays the groundwork for real-time raw-image pipelines in constrained environments.

Abstract

Keypoint detection and local feature description are fundamental tasks in robotic perception, critical for applications such as SLAM, robot localization, feature matching, pose estimation, and 3D mapping. While existing methods predominantly operate on RGB images, we propose a novel network that directly processes raw images, bypassing the need for the Image Signal Processor (ISP). This approach significantly reduces hardware requirements and memory consumption, which is crucial for robotic vision systems. Our method introduces two custom-designed convolutional kernels capable of performing convolutions directly on raw images, preserving inter-channel information without converting to RGB. Experimental results show that our network outperforms existing algorithms on raw images, achieving higher accuracy and stability under large rotations and scale variations. This work represents the first attempt to develop a keypoint detection and feature description network specifically for raw images, offering a more efficient solution for resource-constrained environments.

Keypoint Detection and Description for Raw Bayer Images

TL;DR

This paper addresses the bottleneck of ISP-dependent processing for keypoint detection and local feature description by introducing a raw Bayer image–driven approach. It proposes two specialized Bayer convolution kernels and a two-branch encoder to produce a 256-d pixel-wise descriptor and per-pixel scores without demosaicing, enabling robust matching directly on raw data. Across HPatches-based experiments, the method achieves superior repeatability and higher homography-estimation accuracy for raw Bayer inputs, particularly under rotations and scale changes, outperforming RGB-based state-of-the-art methods on raw data. The work highlights the practical impact of ISP-free, resource-efficient feature extraction for robotics, and lays the groundwork for real-time raw-image pipelines in constrained environments.

Abstract

Keypoint detection and local feature description are fundamental tasks in robotic perception, critical for applications such as SLAM, robot localization, feature matching, pose estimation, and 3D mapping. While existing methods predominantly operate on RGB images, we propose a novel network that directly processes raw images, bypassing the need for the Image Signal Processor (ISP). This approach significantly reduces hardware requirements and memory consumption, which is crucial for robotic vision systems. Our method introduces two custom-designed convolutional kernels capable of performing convolutions directly on raw images, preserving inter-channel information without converting to RGB. Experimental results show that our network outperforms existing algorithms on raw images, achieving higher accuracy and stability under large rotations and scale variations. This work represents the first attempt to develop a keypoint detection and feature description network specifically for raw images, offering a more efficient solution for resource-constrained environments.

Paper Structure

This paper contains 15 sections, 11 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Network architecture. A raw image is processed through an Encoder and a Feature Pyramid Aggregation to generate a 256-dimensional feature map. This feature map is then passed through the Detection Block and Descriptor Block to obtain the Score map and Descriptor map.
  • Figure 2: Siamese Neural Network. Input is a raw image pair with the known transformation. The network shares weights between the two branches, and weights are updated after each backpropagation.
  • Figure 3: Two designed bayer convolution kernels.
  • Figure 4: Keypoints repeatability results. From left to right: the input Bayer image pairs and the outputs of our model, DISK tyszkiewicz2020disk, ALIKED zhao2023aliked, SuperPoint detone2018superpoint, and SIFT lowe2004distinctive. Blue keypoints indicate those that meet the repeatability criterion ($\epsilon = 3$), while red keypoints represent those that do not.
  • Figure 5: Qualitative Results of Deformable Invariance Evaluation. Green lines represent correct correspondences under RANSAC ($\epsilon = 5$), while red points indicate incorrect matching after filtering. Row 1 corresponds to exposure changes, Row 2 shows H-perspective transformations, Row 3 demonstrates large-angle rotations, and Row 4 illustrates scale changes. From left to right, the columns represent the bayer input and the results from our model, DISK tyszkiewicz2020disk, ALIKED zhao2023aliked, SuperPoint detone2018superpoint, and SIFT lowe2004distinctive.