Table of Contents
Fetching ...

CG-CNN: Self-Supervised Feature Extraction Through Contextual Guidance and Transfer Learning

Olcay Kursun, Ahmad Patooghy, Peyman Poursani, Oleg V. Favorov

TL;DR

This work showcases the adaptability of CG-CNNs through applications to various datasets such as Caltech and Brodatz textures, the VibTac-12 tactile dataset, hyperspectral images, and challenges like the XOR problem and text analysis.

Abstract

Contextually Guided Convolutional Neural Networks (CG-CNNs) employ self-supervision and contextual information to develop transferable features across diverse domains, including visual, tactile, temporal, and textual data. This work showcases the adaptability of CG-CNNs through applications to various datasets such as Caltech and Brodatz textures, the VibTac-12 tactile dataset, hyperspectral images, and challenges like the XOR problem and text analysis. In text analysis, CG-CNN employs an innovative embedding strategy that utilizes the context of neighboring words for classification, while in visual and signal data, it enhances feature extraction by exploiting spatial information. CG-CNN mimics the context-guided unsupervised learning mechanisms of biological neural networks and it can be trained to learn its features on limited-size datasets. Our experimental results on natural images reveal that CG-CNN outperforms comparable first-layer features of well-known deep networks such as AlexNet, ResNet, and GoogLeNet in terms of transferability and classification accuracy. In text analysis, CG-CNN learns word embeddings that outperform traditional models like Word2Vec in tasks such as the 20 Newsgroups text classification. Furthermore, ongoing development involves training CG-CNN on outputs from another CG-CNN to explore multi-layered architectures, aiming to construct more complex and descriptive features. This scalability and adaptability to various data types underscore the potential of CG-CNN to handle a wide range of applications, making it a promising architecture for tackling diverse data representation challenges.

CG-CNN: Self-Supervised Feature Extraction Through Contextual Guidance and Transfer Learning

TL;DR

This work showcases the adaptability of CG-CNNs through applications to various datasets such as Caltech and Brodatz textures, the VibTac-12 tactile dataset, hyperspectral images, and challenges like the XOR problem and text analysis.

Abstract

Contextually Guided Convolutional Neural Networks (CG-CNNs) employ self-supervision and contextual information to develop transferable features across diverse domains, including visual, tactile, temporal, and textual data. This work showcases the adaptability of CG-CNNs through applications to various datasets such as Caltech and Brodatz textures, the VibTac-12 tactile dataset, hyperspectral images, and challenges like the XOR problem and text analysis. In text analysis, CG-CNN employs an innovative embedding strategy that utilizes the context of neighboring words for classification, while in visual and signal data, it enhances feature extraction by exploiting spatial information. CG-CNN mimics the context-guided unsupervised learning mechanisms of biological neural networks and it can be trained to learn its features on limited-size datasets. Our experimental results on natural images reveal that CG-CNN outperforms comparable first-layer features of well-known deep networks such as AlexNet, ResNet, and GoogLeNet in terms of transferability and classification accuracy. In text analysis, CG-CNN learns word embeddings that outperform traditional models like Word2Vec in tasks such as the 20 Newsgroups text classification. Furthermore, ongoing development involves training CG-CNN on outputs from another CG-CNN to explore multi-layered architectures, aiming to construct more complex and descriptive features. This scalability and adaptability to various data types underscore the potential of CG-CNN to handle a wide range of applications, making it a promising architecture for tackling diverse data representation challenges.

Paper Structure

This paper contains 15 sections, 7 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: (a) Class-defining contextual groups of data patches. Each patch is shown as a small rectangular box superimposed on one of the database of recordings. Neighboring patches constitute a contextual group and during network training are treated as belonging to the same class. During network training, locations of contextual groups are picked at random. Two tasks and four groups per task are shown on this photo with three patches in each ($C=4$ and $N=3$). (b) CG-CNN architecture.
  • Figure 2: Transfer Utility of CG-CNN features is based on the area under the curve of the test accuracy $A_{CG}$ as a function of the number of test classes $C$. Accuracies obtained using the random and task-specific CNN features, $A_{random}(C)$ and $A_{specific}(C)$ are also shown as they are used in Eq. \ref{['eq:eq7']} to quantify the Transfer Utility, $U$. The expectation of the test accuracies is computed over a number of tasks generated for each value of $C$.
  • Figure 3: (a) Illustration of the CG-CNN architecture applied to the input space of the XOR dataset; (b) the XOR dataset represented using Gaussian blobs with a standard deviation of 0.1.
  • Figure 4: The transferable classification loss and accuracy are depicted in the left and right panels, respectively. As the network is exposed to more tasks, there is an evident emergence of a powerful representation, characterized by progressively lower loss and higher accuracy on self-supervised tasks that indicates a growing Transfer Utility. The smooth curve denotes the running average.
  • Figure 5: The task-specific features for the XOR dataset are tailored exclusively for solving the XOR problem and are not general-purpose. Unlike self-supervised learning methods, these features do not aim to develop versatile attributes applicable across a broader range of tasks.
  • ...and 7 more figures