FINED: Feed Instance-Wise Information Need with Essential and Disentangled Parametric Knowledge from the Past

Kounianhua Du; Jizheng Chen; Jianghao Lin; Menghui Zhu; Bo Chen; Shuai Li; Yong Yu; Weinan Zhang

FINED: Feed Instance-Wise Information Need with Essential and Disentangled Parametric Knowledge from the Past

Kounianhua Du, Jizheng Chen, Jianghao Lin, Menghui Zhu, Bo Chen, Shuai Li, Yong Yu, Weinan Zhang

TL;DR

This paper trains a knowledge extractor that extracts knowledge patterns of arbitrary order from past data and a knowledge encoder that memorizes the arbitrary order patterns, which serves as the retrieval key generator and memory network respectively in the following knowledge reusing phase.

Abstract

Recommender models play a vital role in various industrial scenarios, while often faced with the catastrophic forgetting problem caused by the fast shifting data distribution. To alleviate this problem, a common approach is to reuse knowledge from the historical data. However, preserving the vast and fast-accumulating data is hard, which causes dramatic storage overhead. Memorizing old data through a parametric knowledge base is then proposed, which compresses the vast amount of raw data into model parameters. Despite the flexibility, how to improve the memorization and generalization capabilities of the parametric knowledge base and suit the flexible information need of each instance are challenging. In this paper, we propose FINED to Feed INstance-wise information need with Essential and Disentangled parametric knowledge from past data for recommendation enhancement. Concretely, we train a knowledge extractor that extracts knowledge patterns of arbitrary order from past data and a knowledge encoder that memorizes the arbitrary order patterns, which serves as the retrieval key generator and memory network respectively in the following knowledge reusing phase. The whole process is regularized by the proposed two constraints, which improve the capabilities of the parametric knowledge base without increasing the size of it. The essential principle helps to compress the input into representative vectors that capture the task-relevant information and filter out the noisy information. The disentanglement principle reduces the redundancy of stored information and pushes the knowledge base to focus on capturing the disentangled invariant patterns. These two rules together promote rational compression of information for robust and generalized knowledge representations. Extensive experiments on two datasets justify the effectiveness of the proposed method.

FINED: Feed Instance-Wise Information Need with Essential and Disentangled Parametric Knowledge from the Past

TL;DR

Abstract

Paper Structure (25 sections, 25 equations, 6 figures, 4 tables)

This paper contains 25 sections, 25 equations, 6 figures, 4 tables.

Introduction
Related Work
Conventional Recommendation Backbones
Distribution Robust Recommenders
Preliminaries
Problem Formulation
Model Invariance & Generalization
Methodology
Knowledge Compression
Essential & Disentangled Principles
Knowledge Extractor $f(\cdot)$
Knowledge Encoder $g(\cdot)$
Objective
Knowledge Utilization
Experiments
...and 10 more sections

Figures (6)

Figure 1: (a) Overall process. (b) The essential principle for compression, which encourages a compressed representation that captures the fundamental knowledge of data and filters out the noises. (c) The disentangled principle for compression, which reduces the redundancy of stored patterns and decomposes the invariance for better generalization.
Figure 2: The framework of FINED. In the knowledge compression stage, we compress the essential and disentangled knowledge within old data into the parametric knowledge base. Concretely, we extract patterns within data instances with the knowledge extractor and memorize them through the knowledge encoder that could deal with inputs of arbitrary scales. The overall knowledge compression process is regularized by two principles: essential and disentangled for better generalization and robustness. During prediction, the target could access the frozen knowledge base for instance-wise knowledge and adapt the knowledge to inject it into arbitrary recommendation backbone for enhanced prediction.
Figure 3: Performances of FINED w.r.t different number of knowledge patterns per sample.
Figure 4: Learning curves of different losses on AD.
Figure 5: Pattern scales. (Entry $>0.5$ is regarded as 1.)
...and 1 more figures

FINED: Feed Instance-Wise Information Need with Essential and Disentangled Parametric Knowledge from the Past

TL;DR

Abstract

FINED: Feed Instance-Wise Information Need with Essential and Disentangled Parametric Knowledge from the Past

Authors

TL;DR

Abstract

Table of Contents

Figures (6)