Table of Contents
Fetching ...

Making Logic a First-Class Citizen in Network Data Generation with ML

Hongyu Hè, Minhao Jin, Maria Apostolaki

TL;DR

This work tackles the lack of correctness guarantees and controllability in generative networking models by introducing NetNomos, a neurosymbolic framework that automatically learns first-order-logic constraints from network data, semantically filters them with guidance from large language models, and enforces them during language-model inference using an SMT solver. The approach combines a guarded grammar Γ to express network rules, a minimal hitting set-based rule learning pipeline, and end-to-end inference-time enforcement to provide provable guarantees while preserving statistical fidelity. Evaluations across diverse datasets show NetNomos offers greater expressiveness and scalability than baselines, achieves high precision in semantic filtering (≈98%), and delivers zero rule-violations in most cases, with substantial gains in synthetic data fidelity, forecasting accuracy, and telemetry imputation. The results demonstrate practical benefits for trustworthy network data generation and analytics without retraining, enabling flexible rule adaptation at inference time and facilitating open-source benchmarking for future work.

Abstract

Generative ML models are increasingly popular in networking for tasks such as telemetry imputation, prediction, and synthetic trace generation. Despite their capabilities, they suffer from two shortcomings: (i) their output is often visibly violating well-known networking rules, which undermines their trustworthiness; and (ii) they are difficult to control, frequently requiring retraining even for minor changes. To address these limitations and unlock the benefits of generative models for networking, we propose a new paradigm for integrating explicit network knowledge in the form of first-order logic rules into ML models used for networking tasks. Rules capture well-known relationships among used signals, e.g., that increased latency precedes packet loss. While the idea is conceptually straightforward, its realization is challenging: networking knowledge is rarely formalized into rules, and naively injecting them into ML models often hampers ML's effectiveness. This paper introduces NetNomos a multi-stage framework that (1) learns rules directly from data (e.g., measurements); (2) filters them to distinguish semantically meaningful ones; and (3) enforces them through a collaborative generation between an ML model and an SMT solver.

Making Logic a First-Class Citizen in Network Data Generation with ML

TL;DR

This work tackles the lack of correctness guarantees and controllability in generative networking models by introducing NetNomos, a neurosymbolic framework that automatically learns first-order-logic constraints from network data, semantically filters them with guidance from large language models, and enforces them during language-model inference using an SMT solver. The approach combines a guarded grammar Γ to express network rules, a minimal hitting set-based rule learning pipeline, and end-to-end inference-time enforcement to provide provable guarantees while preserving statistical fidelity. Evaluations across diverse datasets show NetNomos offers greater expressiveness and scalability than baselines, achieves high precision in semantic filtering (≈98%), and delivers zero rule-violations in most cases, with substantial gains in synthetic data fidelity, forecasting accuracy, and telemetry imputation. The results demonstrate practical benefits for trustworthy network data generation and analytics without retraining, enabling flexible rule adaptation at inference time and facilitating open-source benchmarking for future work.

Abstract

Generative ML models are increasingly popular in networking for tasks such as telemetry imputation, prediction, and synthetic trace generation. Despite their capabilities, they suffer from two shortcomings: (i) their output is often visibly violating well-known networking rules, which undermines their trustworthiness; and (ii) they are difficult to control, frequently requiring retraining even for minor changes. To address these limitations and unlock the benefits of generative models for networking, we propose a new paradigm for integrating explicit network knowledge in the form of first-order logic rules into ML models used for networking tasks. Rules capture well-known relationships among used signals, e.g., that increased latency precedes packet loss. While the idea is conceptually straightforward, its realization is challenging: networking knowledge is rarely formalized into rules, and naively injecting them into ML models often hampers ML's effectiveness. This paper introduces NetNomos a multi-stage framework that (1) learns rules directly from data (e.g., measurements); (2) filters them to distinguish semantically meaningful ones; and (3) enforces them through a collaborative generation between an ML model and an SMT solver.

Paper Structure

This paper contains 37 sections, 2 theorems, 6 equations, 17 figures, 7 tables.

Key Result

Theorem 1

Learning a valid constraint $C$ on examples $D$ is equivalent to finding a hitting set $H$ of clauses whose evidence sets hit $D$.

Figures (17)

  • Figure 1: (a): NetNomos finds rules that connect observable variables; these can stem from network principles, protocols, deployment decisions, or their combination. (b): NetNomos strikes a delicate balance between expressiveness and scalability in the trade-off space for learning network rules.
  • Figure 2: NetNomos consists of three stages: Rule Learning, where NetNomos identifies the minimum set of constraints that are consistent with data (i.e., are valid and strongest) after reducing the problem into the minimum hitting set problem; Rule Filtering, where an LLM (or a human) filters out some of the learned rules as meaningless; and Rule Enforcement: where an SMT solver enforces rules during the token-by-token generation of a language model by invalidating the tokens that if selected by the LM would result in an invalid output (e.g., a sequence of packets with header fields that violate protocol rules or an imputed fine-grained vector of measurements that defy network principles).
  • Figure 3: NetNomos learns complex FOL constraints by systematically finding minimal hitting sets.
  • Figure 4: NetNomos invokes a solver during inference to filter out invalid tokens that will cause rule violations.
  • Figure 5: (Upper) NetNomos enforces learned rules at inference time, guaranteeing that model outputs remain within the feasible region defined by constraints. (Lower) In contrast, outputs of Zoom2Net gong2024zoom2net frequently exceed the boundaries.
  • ...and 12 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Lemma 2