Subspace-Configurable Networks

Dong Wang; Olga Saukh; Xiaoxi He; Lothar Thiele

Subspace-Configurable Networks

Dong Wang, Olga Saukh, Xiaoxi He, Lothar Thiele

TL;DR

Subspace-Configurable Networks (SCNs) tackle robustness to sensor drift and deployment-domain shifts by learning a low-dimensional configuration subspace that covers optimal models for a family of input transformations. A configuration network maps transformation parameters $\alpha$ to a coefficient vector $\beta$ that combines base models $\theta_i$ as $\theta = \sum_{i=1}^{D} \beta_i \theta_i$, enabling efficient post-deployment adaptation on edge devices. The authors provide theoretical continuity results linking continuous transformation curves to continuous weight curves, show that small $D$ suffices across diverse transformations, and demonstrate strong empirical performance across 10 transformations, multiple datasets, and hardware-constrained settings, including IoT deployments. They also introduce an $\\

Abstract

While the deployment of deep learning models on edge devices is increasing, these models often lack robustness when faced with dynamic changes in sensed data. This can be attributed to sensor drift, or variations in the data compared to what was used during offline training due to factors such as specific sensor placement or naturally changing sensing conditions. Hence, achieving the desired robustness necessitates the utilization of either an invariant architecture or specialized training approaches, like data augmentation techniques. Alternatively, input transformations can be treated as a domain shift problem, and solved by post-deployment model adaptation. In this paper, we train a parameterized subspace of configurable networks, where an optimal network for a particular parameter setting is part of this subspace. The obtained subspace is low-dimensional and has a surprisingly simple structure even for complex, non-invertible transformations of the input, leading to an exceptionally high efficiency of subspace-configurable networks (SCNs) when limited storage and computing resources are at stake.

Subspace-Configurable Networks

TL;DR

to a coefficient vector

that combines base models

, enabling efficient post-deployment adaptation on edge devices. The authors provide theoretical continuity results linking continuous transformation curves to continuous weight curves, show that small

suffices across diverse transformations, and demonstrate strong empirical performance across 10 transformations, multiple datasets, and hardware-constrained settings, including IoT deployments. They also introduce an $\\

Abstract

Paper Structure (61 sections, 5 theorems, 16 equations, 20 figures, 3 tables)

This paper contains 61 sections, 5 theorems, 16 equations, 20 figures, 3 tables.

Introduction
Subspace-Configurable Networks
Transformations and their parameterization
Learning configurable networks
Continuity of the learned subspaces
Search in the $\alpha$-space and practical value of SCNs
Experimental Results
SCN test set accuracy
Structure of the configuration subspace
SCN dimensionality and capacity constraints
3D rotation-and-projection transformation
Search in the $\alpha$-space and I-SCNs
SCNs on Low-resource Devices
Conclusion, Limitations, and Future Work
Post-deployment adaptation.
...and 46 more sections

Key Result

Theorem 2.1

Suppose that the loss function $E(\theta, \alpha)$ satisfies the Lipschitz condition for $\alpha^{(1)}, \alpha^{(2)} \in \mathbb{A}$, and $E(\theta, \alpha)$ is differentiable w.r.t. to $\theta$ and $\alpha$. Then, for any continuous curve $\alpha(s) \in \mathbb{A}$ with $0 \leq s \leq \hat{s}$ in the parameter space of data transformations there exists a corresponding curve $\theta

Figures (20)

Figure 1: Training subspace-configurable networks (SCNs), where an optimal network for a fixed transformation parameter vector is part of the subspace retained by few configuration parameters. Left: Given input transformation parameters $\alpha$, e.g., a rotation angle for a 2D rotation, we train a configuration network which yields a $D$-dimensional configuration subspace ($\beta$-space) to construct an efficient inference network with weights $\theta = \sum \beta_i \cdot \theta_i$, where $\theta_i$ are the weights of the base models, and $\beta$ is a configuration vector. Middle: Optimal model parameters in the configuration subspace as function of the rotation angle $\alpha$ given by $(\cos(\phi), \sin(\phi))$ for 2D rotation transformations applied to FMNIST xiao2017fashionmnist. Here SCN has three base models with parameters $\theta_i$ and three configuration vectors $\beta_i$ to compose the weights of the 1-layer MLP inference model. Right: Test accuracy of SCNs with $D=1..64$ dimensions compared to a single network trained with data augmentation (One4All), classifiers trained on canonicalized data achieved by applying inverse rotation transformation with the corresponding parameters (Inverse), and networks trained and tested on datasets where all images are rotated by a fixed degree (One4One). Each violin shows the performance of a model on all degrees with a discretization step of $1^\circ$, expect for One4One where the models are independently trained and evaluated on $0$, $\pi/6$, $\pi/4$, $\pi/3$, $\pi/2$ rotated input.
Figure 2: SCN test accuracy for 2D rotation and scaling transformations.Left and middle: 2D rotation parameterized by a rotation degree $\phi=0..2\pi$ input to the configuration network as $\alpha=(\cos(\phi), \sin(\phi))$. For each $\alpha$, SCN determines a configuration vector $\beta$ used to build a dedicated model for every angle shown on the right. The middle polar plot shows the performance of a single model ($\phi=0^\circ$) on all angles. The model works best for the input transformed with $T(\phi=0^\circ)$. Inference network architecture is a 1-layer MLP with 32 hidden units trained on FMNIST. The models constructed by SCN outperform One4All approaching Inverse and One4One accuracy already for small $D$. Right top: Scaling transformation parameterized by the scaling factor $\alpha=0.2..2.0$. Right bottom: SCN performance of a single model ($\alpha=1.0$) on all inputs. The dedicated model gets increasingly specialized for the target input parameters with higher $D$. Inference network is a 5-layer MLP with 32 hidden units in each layer trained on FMNIST. Also see Appendix \ref{['sec:cn:accuracy']} and videos showing SCN inference models for each parameter setting.
Figure 3: SCNs achieve high test accuracy already for low $D$, outperforming One4All and approaching (and in some cases outperforming) both Inverse and One4One baselines. 2 plots on the left: 2D rotation on ShallowCNN--SVHN and ResNet18--CIFAR10. 2 plots on the right: Scaling on MLP--FMNIST and ShallowCNN--SVHN. The plots are complementary to Figure \ref{['fig:mlpb:accuracy']} evaluating the performance of SCN on different transformations and dataset-architecture pairs. For translation, the violin for One4One comprises prediction accuracy of independently trained models for (0,0) and ($\pm$8,$\pm$8) shift parameters. A detailed evaluation of SCNs for translation is detailed in Appendix \ref{['sec:translation']}.
Figure 4: A typical view of the $\beta$-space for 2D rotation, scaling and translation, $D=1..8$. The $\beta$-space is nicely shaped, with each $\beta$ being responsible for a specific range of inputs with smooth transitions. Top: SCNs for 2D rotation on ResNet18--CIFAR10. Transformation parameters are a vector $\alpha = (\alpha_1, \alpha_2) = (\cos(\phi), \sin(\phi))$, with $\phi$ being a rotation angle. Middle: SCNs for scaling on ShallowCNN--SVHN, with a scaling factor $\alpha$ between 0.2 and 2.0. Bottom: SCNs for translation on MLP--FMNIST. A shift is specified by two parameters $(\alpha_x, \alpha_y)$ varying in the range (-8,8) along $x$ and $y$ axes. A visualization for other dataset-architecture pairs is presented in Appendix \ref{['sec:betaspace']}.
Figure 5: Effect of network capacity on SCN test accuracy for 2D rotation. We vary inference network width and depth to obtain the models of different capacity. 2 plots on the left: Effect of width and depth for MLPs on FMNIST. In the width experiments, all networks are 1-layer MLPs. In the depth experiments, network width is fixed to 32 hidden units. 2 plots on the right: Effect of width and depth for ShallowCNNs on SVHN. In the width experiments, the depth is fixed to two layers scaled together. In the depth experiments, the width of the hidden layers is fixed to 32 channels.
...and 15 more figures

Theorems & Definitions (8)

Theorem 2.1: Continuity
Corollary 2.2
Theorem A.1
proof
Corollary A.2
proof
Theorem A.3
proof

Subspace-Configurable Networks

TL;DR

Abstract

Subspace-Configurable Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (8)