Table of Contents
Fetching ...

Demystifying KAN for Vision Tasks: The RepKAN Approach

Minjong Cheon

TL;DR

Experimental results demonstrate that RepKAN provides explicit physically interpretable reasoning while outperforming state-of-the-art models, and indicate that RepKAN holds significant potential to serve as the backbone for future interpretable visual foundation models.

Abstract

Remote sensing image classification is essential for Earth observation, yet standard CNNs and Transformers often function as uninterpretable black-boxes. We propose RepKAN, a novel architecture that integrates the structural efficiency of CNNs with the non-linear representational power of KANs. By utilizing a dual-path design -- Spatial Linear and Spectral Non-linear -- RepKAN enables the autonomous discovery of class-specific spectral fingerprints and physical interaction manifolds. Experimental results on the EuroSAT and NWPU-RESISC45 datasets demonstrate that RepKAN provides explicit physically interpretable reasoning while outperforming state-of-the-art models. These findings indicate that RepKAN holds significant potential to serve as the backbone for future interpretable visual foundation models.

Demystifying KAN for Vision Tasks: The RepKAN Approach

TL;DR

Experimental results demonstrate that RepKAN provides explicit physically interpretable reasoning while outperforming state-of-the-art models, and indicate that RepKAN holds significant potential to serve as the backbone for future interpretable visual foundation models.

Abstract

Remote sensing image classification is essential for Earth observation, yet standard CNNs and Transformers often function as uninterpretable black-boxes. We propose RepKAN, a novel architecture that integrates the structural efficiency of CNNs with the non-linear representational power of KANs. By utilizing a dual-path design -- Spatial Linear and Spectral Non-linear -- RepKAN enables the autonomous discovery of class-specific spectral fingerprints and physical interaction manifolds. Experimental results on the EuroSAT and NWPU-RESISC45 datasets demonstrate that RepKAN provides explicit physically interpretable reasoning while outperforming state-of-the-art models. These findings indicate that RepKAN holds significant potential to serve as the backbone for future interpretable visual foundation models.
Paper Structure (17 sections, 7 equations, 5 figures, 3 tables)

This paper contains 17 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of the RepKAN architecture and the proposed RepKAN block.(A) Hierarchical structure of RepKAN for 13-channel multispectral image classification. The network progressively abstracts features across three stages, with the neural energy bar illustrating the transition from spatial to spectral focus. (B) Detailed design of the RepKAN block. During training, it utilizes a dual-path mechanism: a Spatial Linear Path for local spatial context and a Spectral Non-linear Path ($\mathcal{F}_{spline}$) featuring learnable activation functions $\phi(x)$ to model channel-wise band interactions. For efficient deployment, the spatial branches are mathematically fused into a single $3\times3$ convolution ($\mathcal{W}_{deploy}$) via structural reparameterization, computing the final output as $Y_{deploy} = (\mathcal{W}_{deploy} * X') + B + \mathcal{F}_{spline}(X')$.
  • Figure 2: Comprehensive analysis of spectral indices in RepKAN. (a) Class-wise dependency analysis at Stage 1, showing the contribution ratio between the Base Path (Spatial/CNN) and the Spline Path (Spectral/KAN). The model exhibits a high dependency ($>77.1\%$) on non-linear spectral interactions for all categories, especially for SeaLake ($91.0\%$). (b) 1D spline activation curve for the NIR band with overlaid pixel distributions of Forest, River, and Highway. The learned gray spline acts as a non-linear projector that differentiates land-cover types based on their physical NIR reflectance. (c) KAN-learned 2D spectral interaction landscape between Red and NIR bands. The contour map represents a self-generated spectral-spatial index, demonstrating the model's ability to discover optimal band combinations for land-cover classification.
  • Figure 3: Interpretability analysis via learned spline activation functions for Red and NIR spectral bands. Each subplot displays the non-linear mapping learned by the model's spline layers (e.g., in a KAN-based architecture) for specific filters corresponding to various land-cover classes. The dashed lines represent the continuous learned activation functions, while the dots indicate the actual activation values across the input intensity range. These visualizations reveal how the model autonomously adapts its activation shapes---ranging from parabolic to sigmoidal---to capture unique spectral signatures, providing transparency into the decision-making process for remote sensing classification.
  • Figure 4: Comparative case study and spectral reasoning visualization. A row-wise comparison between a baseline CNN and RepKAN on selected EuroSAT samples where the CNN fails but RepKAN succeeds. Column 1: Ground-truth (GT) RGB images. Column 2: Class probability distributions from the baseline CNN, highlighting misclassification errors (red bars). Column 3: Correct classification results from RepKAN (green bars). Column 4: Spectral reasoning maps from RepKAN's first stage, illustrating the internal evidence used to distinguish ambiguous land-cover types. The reasoning maps (RdYlGn scale) confirm that RepKAN's success is rooted in its ability to identify class-specific non-linear spectral interactions.
  • Figure 5: Performance validation on high-resolution aerial imagery (RESISC45). Qualitative comparison between a baseline CNN and RepKAN on semantically complex scenes. Column 1: Ground-truth (GT) aerial images. Column 2: CNN top-5 probability distributions showing failure cases due to structural ambiguity (red bars). Column 3: RepKAN top-5 distributions showing robust and correct classification (green bars). Column 4: RepKAN activation maps illustrating the internal feature extraction process. The maps demonstrate how RepKAN isolates discriminative structural and spectral features to resolve errors typical of standard spatial-only networks.