Beyond Scaleup: Knowledge-aware Parsimony Learning from Deep Networks

Quanming Yao; Yongqi Zhang; Yaqing Wang; Nan Yin; James Kwok; Qiang Yang

Beyond Scaleup: Knowledge-aware Parsimony Learning from Deep Networks

Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang

TL;DR

This work advocates a knowledge-driven, parsimony-centric approach to machine learning as a sustainable alternative to brute-force scaleup. By treating domain knowledge—symbols, logic, and laws—as modular building blocks, the framework targets parsimony across model design, training, and interpretation, integrating knowledge and data in a complementary dual space. Across modules like AutoBLM, ColdNAS, PAR, PACIA, RED-GNN, and EmerGNN, the approach achieves competitive or superior performance with simpler architectures, fewer trainable parameters, and improved interpretability. The authors demonstrate potential in AI for science, notably drug-drug interaction prediction, and outline a roadmap for theory, methods, and applications that could influence future foundation-model research toward more efficient, trustworthy, and versatile systems.

Abstract

The brute-force scaleup of training datasets, learnable parameters and computation power, has become a prevalent strategy for developing more robust learning models. However, due to bottlenecks in data, computation, and trust, the sustainability of this strategy is a serious concern. In this paper, we attempt to address this issue in a parsimonious manner (i.e., achieving greater potential with simpler models). The key is to drive models using domain-specific knowledge, such as symbols, logic, and formulas, instead of purely relying on scaleup. This approach allows us to build a framework that uses this knowledge as "building blocks" to achieve parsimony in model design, training, and interpretation. Empirical results show that our methods surpass those that typically follow the scaling law. We also demonstrate our framework in AI for science, specifically in the problem of drug-drug interaction prediction. We hope our research can foster more diverse technical roadmaps in the era of foundation models.

Beyond Scaleup: Knowledge-aware Parsimony Learning from Deep Networks

TL;DR

Abstract

Paper Structure (15 sections, 12 figures, 6 tables)

This paper contains 15 sections, 12 figures, 6 tables.

Introduction
Research Landscape
Parsimony on Model
Automated Bi-linear Scoring Function Design
Symbolized Architecture Search for Recommendation
Parsimony on Training
Property-Aware Relation Networks
Parameter-Efficient GNN Adapter
Parsimony on Interpretation
Interpreting with Subgraph Learning
Symbolic Regression on Graphs
Potential in Drug Development
Future Works
Conclusion
Author Biographies

Figures (12)

Figure 1: The data, computational and trust bottlenecks of LLMs (left), and the development of parsimony learning (right).
Figure 1: Illustration of knowledge space, function space, and the relationship in each method.
Figure 2: The three primary colors (left) and the knowledge-aware parsimony learning framework (right).
Figure 3: AutoBLM first sets up a search space by analyzing existing scoring functions and then utilizes the bi-level optimization to extract semantics and relationships simultaneously.
Figure 4: ColdNAS uses a hypernetwork to map each user’s history interactions to user-specific parameters which are then used to modulate the predictor, and formulate how to modulate and where to modulate as a NAS problem.
...and 7 more figures

Beyond Scaleup: Knowledge-aware Parsimony Learning from Deep Networks

TL;DR

Abstract

Beyond Scaleup: Knowledge-aware Parsimony Learning from Deep Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (12)