Advancing Deep Learning through Probability Engineering: A Pragmatic Paradigm for Modern AI
Jianyi Zhang
TL;DR
This work reframes probabilistic modeling as an engineering practice for modern AI, arguing that probability distributions should be actively designed and manipulated to meet real-world constraints. It introduces Probability Engineering across four domains: Bayesian deep learning (via SPOS and related samplers), edge AI (Fed-CBS and ReAugKD for FL and KD), and generative AI (SLED for LLM factuality and ARTIST for text-rich diffusion). The key contributions include SPOS with non-asymptotic guarantees, Fed-CBS with privacy-preserving QCID-based client selection, ReAugKD with retrieval-augmented distillation, SLED for decoding-time factuality, and ARTIST for disentangled text-visual generation. Collectively, these methods demonstrate robust, efficient, and trustworthy AI capabilities, illustrating how probabilistic reasoning can be integrated into learning, inference, and generation pipelines to tackle non-IID data, evolving knowledge, and multimodal challenges.
Abstract
Recent years have witnessed the rapid progression of deep learning, pushing us closer to the realization of AGI (Artificial General Intelligence). Probabilistic modeling is critical to many of these advancements, which provides a foundational framework for capturing data distributions. However, as the scale and complexity of AI applications grow, traditional probabilistic modeling faces escalating challenges, such as high-dimensional parameter spaces, heterogeneous data sources, and evolving real-world requirements often render classical approaches insufficiently flexible. This paper proposes a novel concept, Probability Engineering, which treats the already-learned probability distributions within deep learning as engineering artifacts. Rather than merely fitting or inferring distributions, we actively modify and reinforce them to better address the diverse and evolving demands of modern AI. Specifically, Probability Engineering introduces novel techniques and constraints to refine existing probability distributions, improving their robustness, efficiency, adaptability, or trustworthiness. We showcase this paradigm through a series of applications spanning Bayesian deep learning, Edge AI (including federated learning and knowledge distillation), and Generative AI (such as text-to-image generation with diffusion models and high-quality text generation with large language models). These case studies demonstrate how probability distributions once treated as static objects can be engineered to meet the diverse and evolving requirements of large-scale, data-intensive, and trustworthy AI systems. By systematically expanding and strengthening the role of probabilistic modeling, Probability Engineering paves the way for more robust, adaptive, efficient, and trustworthy deep learning solutions in today's fast-growing AI era.
