Table of Contents
Fetching ...

Training-Free Stimulus Encoding for Retinal Implants via Sparse Projected Gradient Descent

Henning Konermann, Yuli Wu, Emil Mededovic, Volkmar Schulz, Peter Walter, Johannes Stegmaier

TL;DR

This work presents a training-free framework for encoding stimuli in retinal implants by formulating the task as a bound-constrained sparse least-squares problem under a linearized, patient-specific perceptual forward model. The key insight is that the perception matrix $\mathbf P_{\phi}$ can be highly sparse, enabling an efficient sparse solver based on projected residual norm steepest descent with exact line search and bound projections. By warm-starting across frames, the method achieves an anytime speed–quality trade-off suitable for real-time encoding and avoids the memory burden of dense pseudo-inverses. In silico experiments across four simulated patients and multiple implant sizes show consistent fidelity gains over Lanczos downsampling, with the strongest improvements on Fashion-MNIST, and highlight important interactions between patient parameters and dataset type. The approach remains dependent on the perceptual model, underscoring the value of human-in-the-loop calibration and opening avenues for online linearization and broader implant modalities.

Abstract

Retinal implants aim to restore functional vision despite photoreceptor degeneration, yet are fundamentally constrained by low resolution electrode arrays and patient-specific perceptual distortions. Most deployed encoders rely on task-agnostic downsampling and linear brightness-to-amplitude mappings, which are suboptimal under realistic perceptual models. While global inverse problems have been formulated as neural networks, such approaches can be fast at inference, and can achieve high reconstruction fidelity, but require training and have limited generalizability to arbitrary inputs. We cast stimulus encoding as a constrained sparse least-squares problem under a linearized perceptual forward model. Our key observation is that the resulting perception matrix can be highly sparse, depending on patient and implant configuration. Building on this, we apply an efficient projected residual norm steepest descent solver that exploits sparsity and supports stimulus bounds via projection. In silico experiments across four simulated patients and implant resolutions from $15\times15$ to $100\times100$ electrodes demonstrate improved reconstruction fidelity, with up to $+0.265$ SSIM increase, $+12.4\,\mathrm{dB}$ PSNR, and $81.4\%$ MAE reduction on Fashion-MNIST compared to Lanczos downsampling.

Training-Free Stimulus Encoding for Retinal Implants via Sparse Projected Gradient Descent

TL;DR

This work presents a training-free framework for encoding stimuli in retinal implants by formulating the task as a bound-constrained sparse least-squares problem under a linearized, patient-specific perceptual forward model. The key insight is that the perception matrix can be highly sparse, enabling an efficient sparse solver based on projected residual norm steepest descent with exact line search and bound projections. By warm-starting across frames, the method achieves an anytime speed–quality trade-off suitable for real-time encoding and avoids the memory burden of dense pseudo-inverses. In silico experiments across four simulated patients and multiple implant sizes show consistent fidelity gains over Lanczos downsampling, with the strongest improvements on Fashion-MNIST, and highlight important interactions between patient parameters and dataset type. The approach remains dependent on the perceptual model, underscoring the value of human-in-the-loop calibration and opening avenues for online linearization and broader implant modalities.

Abstract

Retinal implants aim to restore functional vision despite photoreceptor degeneration, yet are fundamentally constrained by low resolution electrode arrays and patient-specific perceptual distortions. Most deployed encoders rely on task-agnostic downsampling and linear brightness-to-amplitude mappings, which are suboptimal under realistic perceptual models. While global inverse problems have been formulated as neural networks, such approaches can be fast at inference, and can achieve high reconstruction fidelity, but require training and have limited generalizability to arbitrary inputs. We cast stimulus encoding as a constrained sparse least-squares problem under a linearized perceptual forward model. Our key observation is that the resulting perception matrix can be highly sparse, depending on patient and implant configuration. Building on this, we apply an efficient projected residual norm steepest descent solver that exploits sparsity and supports stimulus bounds via projection. In silico experiments across four simulated patients and implant resolutions from to electrodes demonstrate improved reconstruction fidelity, with up to SSIM increase, PSNR, and MAE reduction on Fashion-MNIST compared to Lanczos downsampling.
Paper Structure (14 sections, 7 equations, 10 figures, 5 tables)

This paper contains 14 sections, 7 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Overview of the proposed encoding pipeline. The encoder updates the stimulus via projected residual norm steepest descent under the linearized perception model $\mathbf P_\phi$, and produces the percept through the nonlinear forward model $\mathcal{P}_\phi$. For potential real-time use, warm-starting across frames via a one-frame delay ($K=1$) could be employed.
  • Figure 2: Sparsity of the perception matrix $\mathbf P_{\phi}$ across implant resolutions and patient configurations. Percentages are computed after thresholding values below 5% of the maximum matrix brightness. Patients as in \ref{['sec:exp']}. Additional experiments analyzing the effect of sparsity truncation are provided in the supplementary material.
  • Figure 3: Sparse matrix size across electrode counts, shown for $N=M$ equal to the electrode count. Truncation is performed as in \ref{['fig:perception_matrix_sparsity']} and patients as in \ref{['sec:exp']}.
  • Figure 4: Speed--quality trade-off, SSIM versus aggregated iteration time. Aggregated time uses GTX1660 per-iteration means ($15\times15$: 0.442ms, $28\times28$: 0.445ms, $100\times100$: 1.764ms). The horizontal line denotes Lanczos SSIM, and crossover points mark when our method matches it.
  • Figure 5: Qualitative comparison across implant resolutions and encoding methods for Patient 1, using a $100\times100$ simulated perception grid.
  • ...and 5 more figures