Frontiers of Deep Learning: From Novel Application to Real-World Deployment

Rui Xie

Frontiers of Deep Learning: From Novel Application to Real-World Deployment

Rui Xie

TL;DR

The paper surveys two complementary threads in deep learning: applying transformer architectures to synthetic aperture radar despeckling to mitigate multiplicative speckle and preserve details, and designing in-storage computing (RM-SSD) to accelerate large-scale recommender systems by performing embedding lookups and MLP inference inside SSDs. The SAR despeckling approach uses an encoder with overlap patch embeddings, transformer blocks with self-attention and depth-wise convolutions, and a convolutional projection decoder, trained with a combined $L_{l2}$ and $L_{tv}$ loss to balance noise suppression and edge preservation; it demonstrates strong PSNR and SSIM gains on synthetic and real SAR data. RM-SSD integrates conventional SSD components with an embedding-lookup engine and an MLP acceleration engine to reduce DRAM pressure, achieving substantial throughput and latency improvements across multiple recommendation models, while employing a kernel-search-based optimization to reduce hardware resources. Overall, the work highlights how advanced neural architectures can improve imaging quality and how near-data computing within storage can enable practical deployment of DL-based recommender systems, signaling a path toward more capable and cost-efficient real-world AI systems.

Abstract

Deep learning continues to re-shape numerous fields, from natural language processing and imaging to data analytics and recommendation systems. This report studies two research papers that represent recent progress on deep learning from two largely different aspects: The first paper applied the transformer networks, which are typically used in language models, to improve the quality of synthetic aperture radar image by effectively reducing the speckle noise. The second paper presents an in-storage computing design solution to enable cost-efficient and high-performance implementations of deep learning recommendation systems. In addition to summarizing each paper in terms of motivation, key ideas and techniques, and evaluation results, this report also presents thoughts and discussions about possible future research directions. By carrying out in-depth study on these two representative papers and related references, this doctoral candidate has developed better understanding on the far-reaching impact and efficient implementation of deep learning models.

Frontiers of Deep Learning: From Novel Application to Real-World Deployment

TL;DR

and

loss to balance noise suppression and edge preservation; it demonstrates strong PSNR and SSIM gains on synthetic and real SAR data. RM-SSD integrates conventional SSD components with an embedding-lookup engine and an MLP acceleration engine to reduce DRAM pressure, achieving substantial throughput and latency improvements across multiple recommendation models, while employing a kernel-search-based optimization to reduce hardware resources. Overall, the work highlights how advanced neural architectures can improve imaging quality and how near-data computing within storage can enable practical deployment of DL-based recommender systems, signaling a path toward more capable and cost-efficient real-world AI systems.

Abstract

Paper Structure (29 sections, 8 equations, 3 figures)

This paper contains 29 sections, 8 equations, 3 figures.

Introduction
Transformer-Based Architecture for SAR Image Despeckling
Background
Problem Statement
Proposed Design Solution
Network Architecture
Overlap Patch Embedding Block
Transformer Block
Convolutional Projection Block
Loss Function
Evaluation
Further Thoughts
Efficient Implementation via In-Storage Computing
Background
Problem Statement
...and 14 more sections

Figures (3)

Figure 1: Overview of transformer-based despeckling transformer architecture perera2022transformer.
Figure 2: Illustration of the proposed RM-SSD sun2022rm that contains: (1) Conventional components (yellow blocks) for conventional SSD control; (2) Embedding lookup engine (orange blocks) for fast retrieval of embeddings; (3) MLP acceleration engine (green blocks) for handling the MLP part of the recommendation model.
Figure 3: Illustration of (a) conventional design practice where a layer $L_{i+1}$ can only receive the first input vector after all columns of $L_i$ have completed accumulation, and (b) proposed inter-layer composition that implements row scanning in $L_{i+1}$ while maintaining column scanning in $L_i$.

Frontiers of Deep Learning: From Novel Application to Real-World Deployment

TL;DR

Abstract

Frontiers of Deep Learning: From Novel Application to Real-World Deployment

Authors

TL;DR

Abstract

Table of Contents

Figures (3)