Making Logic a First-Class Citizen in Network Data Generation with ML

Hongyu Hè; Minhao Jin; Maria Apostolaki

Making Logic a First-Class Citizen in Network Data Generation with ML

Hongyu Hè, Minhao Jin, Maria Apostolaki

TL;DR

This work tackles the lack of correctness guarantees and controllability in generative networking models by introducing NetNomos, a neurosymbolic framework that automatically learns first-order-logic constraints from network data, semantically filters them with guidance from large language models, and enforces them during language-model inference using an SMT solver. The approach combines a guarded grammar Γ to express network rules, a minimal hitting set-based rule learning pipeline, and end-to-end inference-time enforcement to provide provable guarantees while preserving statistical fidelity. Evaluations across diverse datasets show NetNomos offers greater expressiveness and scalability than baselines, achieves high precision in semantic filtering (≈98%), and delivers zero rule-violations in most cases, with substantial gains in synthetic data fidelity, forecasting accuracy, and telemetry imputation. The results demonstrate practical benefits for trustworthy network data generation and analytics without retraining, enabling flexible rule adaptation at inference time and facilitating open-source benchmarking for future work.

Abstract

Generative ML models are increasingly popular in networking for tasks such as telemetry imputation, prediction, and synthetic trace generation. Despite their capabilities, they suffer from two shortcomings: (i) their output is often visibly violating well-known networking rules, which undermines their trustworthiness; and (ii) they are difficult to control, frequently requiring retraining even for minor changes. To address these limitations and unlock the benefits of generative models for networking, we propose a new paradigm for integrating explicit network knowledge in the form of first-order logic rules into ML models used for networking tasks. Rules capture well-known relationships among used signals, e.g., that increased latency precedes packet loss. While the idea is conceptually straightforward, its realization is challenging: networking knowledge is rarely formalized into rules, and naively injecting them into ML models often hampers ML's effectiveness. This paper introduces NetNomos a multi-stage framework that (1) learns rules directly from data (e.g., measurements); (2) filters them to distinguish semantically meaningful ones; and (3) enforces them through a collaborative generation between an ML model and an SMT solver.

Making Logic a First-Class Citizen in Network Data Generation with ML

TL;DR

Abstract

Making Logic a First-Class Citizen in Network Data Generation with ML

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (2)