HoughToRadon Transform: New Neural Network Layer for Features Improvement in Projection Space
Alexandra Zhabitskaya, Alexander Sheshkus, Vladimir L. Arlazarov
TL;DR
The paper addresses inefficiencies in HT-based neural networks for segmentation by introducing the HoughToRadon Transform (HRT), a fixed layer that converts the Hough space ($s,t$) to a Radon-like space ($\rho,\varphi$) and back via the RadonToHough Transform (RHT). The parameters $n$ (angles) and $scaleX$ control the number of angles and the width of the transformed feature map, enabling a significant reduction in intermediate feature-map size while preserving or improving segmentation accuracy, demonstrated on the MIDV-500 dataset with MIoU reaching up to $97.7\%$ and substantial time savings over prior HT-based methods. The approach is implemented inside the HoughEncoder architecture, showing that inner convolutions can operate on smaller, linearly related representations, thereby accelerating training and inference. Overall, the work provides a practical, tunable method to accelerate HT-enabled neural networks for document segmentation and highlights the value of coordinate-space linearization in deep feature processing.
Abstract
In this paper, we introduce HoughToRadon Transform layer, a novel layer designed to improve the speed of neural networks incorporated with Hough Transform to solve semantic image segmentation problems. By placing it after a Hough Transform layer, "inner" convolutions receive modified feature maps with new beneficial properties, such as a smaller area of processed images and parameter space linearity by angle and shift. These properties were not presented in Hough Transform alone. Furthermore, HoughToRadon Transform layer allows us to adjust the size of intermediate feature maps using two new parameters, thus allowing us to balance the speed and quality of the resulting neural network. Our experiments on the open MIDV-500 dataset show that this new approach leads to time savings in document segmentation tasks and achieves state-of-the-art 97.7% accuracy, outperforming HoughEncoder with larger computational complexity.
