Table of Contents
Fetching ...

Classification and Reconstruction Processes in Deep Predictive Coding Networks: Antagonists or Allies?

Jan Rathjens, Laurenz Wiskott

TL;DR

The paper investigates whether classification and reconstruction can synergistically share representations in deep predictive coding-inspired networks. It introduces Classification-Reconstruction Encoders (CRE) that share a latent $z$ between encoder, decoder, and classifier and optimize $L = \lambda L_{\text{MSE}} + (1-\lambda) L_{\text{CE}}$ across multiple architectures and datasets, including FC, CNN, and ViT variants on MNIST, FashionMNIST, and CIFAR-10. Empirically, a robust trade-off emerges: increasing $\lambda$ improves reconstruction at the cost of classification, and larger latent spaces or more complex networks mitigate the trade-off but do not produce the hoped-for synergistic gains. These findings challenge assumptions about inherent synergy in DPCNs and suggest design strategies should account for resource competition rather than enforcing joint optimization.

Abstract

Predictive coding-inspired deep networks for visual computing integrate classification and reconstruction processes in shared intermediate layers. Although synergy between these processes is commonly assumed, it has yet to be convincingly demonstrated. In this study, we take a critical look at how classifying and reconstructing interact in deep learning architectures. Our approach utilizes a purposefully designed family of model architectures reminiscent of autoencoders, each equipped with an encoder, a decoder, and a classification head featuring varying modules and complexities. We meticulously analyze the extent to which classification- and reconstruction-driven information can seamlessly coexist within the shared latent layer of the model architectures. Our findings underscore a significant challenge: Classification-driven information diminishes reconstruction-driven information in intermediate layers' shared representations and vice versa. While expanding the shared representation's dimensions or increasing the network's complexity can alleviate this trade-off effect, our results challenge prevailing assumptions in predictive coding and offer guidance for future iterations of predictive coding concepts in deep networks.

Classification and Reconstruction Processes in Deep Predictive Coding Networks: Antagonists or Allies?

TL;DR

The paper investigates whether classification and reconstruction can synergistically share representations in deep predictive coding-inspired networks. It introduces Classification-Reconstruction Encoders (CRE) that share a latent between encoder, decoder, and classifier and optimize across multiple architectures and datasets, including FC, CNN, and ViT variants on MNIST, FashionMNIST, and CIFAR-10. Empirically, a robust trade-off emerges: increasing improves reconstruction at the cost of classification, and larger latent spaces or more complex networks mitigate the trade-off but do not produce the hoped-for synergistic gains. These findings challenge assumptions about inherent synergy in DPCNs and suggest design strategies should account for resource competition rather than enforcing joint optimization.

Abstract

Predictive coding-inspired deep networks for visual computing integrate classification and reconstruction processes in shared intermediate layers. Although synergy between these processes is commonly assumed, it has yet to be convincingly demonstrated. In this study, we take a critical look at how classifying and reconstructing interact in deep learning architectures. Our approach utilizes a purposefully designed family of model architectures reminiscent of autoencoders, each equipped with an encoder, a decoder, and a classification head featuring varying modules and complexities. We meticulously analyze the extent to which classification- and reconstruction-driven information can seamlessly coexist within the shared latent layer of the model architectures. Our findings underscore a significant challenge: Classification-driven information diminishes reconstruction-driven information in intermediate layers' shared representations and vice versa. While expanding the shared representation's dimensions or increasing the network's complexity can alleviate this trade-off effect, our results challenge prevailing assumptions in predictive coding and offer guidance for future iterations of predictive coding concepts in deep networks.
Paper Structure (23 sections, 1 equation, 15 figures, 3 tables)

This paper contains 23 sections, 1 equation, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Inference in a PCN. Sensory stimulus predictions are hierarchically generated at the top layer and transmitted to the bottom via feedback connections. Prediction errors travel from the bottom to the top layer through feedforward connections. Each layer iteratively adjusts its values to minimize the sum of all prediction errors.
  • Figure 2: Classification-Reconstruction Encoder (CRE). The CRE comprises a decoder and a classifier connected to the latent representation $z$. The encoder can be optimized to encode an input image in either a classification- or reconstruction-driven manner.
  • Figure 3: Performances w.r.t. $\lambda$. The box plots of the classification (blue) and reconstruction (orange) performance for different variants of trained CREs on several datasets are displayed.
  • Figure 4: Visualization of the latent space across different $\lambda$-values. Each plot displays 3D coordinates of input images in the latent space. An exemplary FC-based CRE for each $\lambda$-value with a 3-dimensional latent space was used to generate each plot. Each color represents one class. We depict fifty instances from each class. The top row showcases the MNIST dataset, and the bottom row showcases the FashionMNIST dataset.
  • Figure 5: Visualization of reconstructions across different $\lambda$-values. The left column shows the input images. The other columns display exemplary reconstructions w.r.t. their respective $\lambda$-value. An exemplary FC-based CRE for each $\lambda$-value with a 3-dimensional latent space was used to generate each plot. The top row showcases the MNIST dataset, and the bottom row showcases the FashionMNIST dataset.
  • ...and 10 more figures