A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Arther Tian; Alex Ding; Frank Chen; Simon Wu; Aaron Chan

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Arther Tian, Alex Ding, Frank Chen, Simon Wu, Aaron Chan

TL;DR

A multi-dimensional quality scoring framework that decomposes output quality into modular dimensions, including model and cost priors, structure quality, semantic quality, query-output alignment, and agreement/uncertainty is proposed.

Abstract

Decentralized large language model (LLM) inference networks can pool heterogeneous compute to scale serving, but they require lightweight and incentive-compatible mechanisms to assess output quality. Prior work introduced cost-aware Proof of Quality (PoQ) and adaptive robust PoQ to allocate rewards under evaluator heterogeneity and adversarial behavior. In this paper, we focus on the quality signal itself and propose a multi-dimensional quality scoring framework that decomposes output quality into modular dimensions, including model and cost priors, structure quality, semantic quality, query-output alignment, and agreement/uncertainty. Using logged outputs from QA and summarization tasks, we systematically audit dimension reliability and show that seemingly reasonable dimensions can be task-dependent and even negatively correlated with reference quality without calibration. While the default composite underperforms a strong single semantic evaluator, ablations reveal that removing unreliable dimensions and re-normalizing weights yields a calibrated composite that matches or exceeds the best single- evaluator and consensus baselines. Finally, we integrate the composite score as a drop-in quality signal in PoQ and demonstrate complementary benefits with robust aggregation and adaptive trust weighting under adversarial evaluator attacks.

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

TL;DR

Abstract

Paper Structure (71 sections, 14 figures, 9 tables)

This paper contains 71 sections, 14 figures, 9 tables.

Introduction
Contributions.
Paper organization.
Background and Problem Setting
Proof of Quality for Decentralized LLM Inference
System setting.
PoQ intuition.
Consensus and robustness.
Quality Signal Design Challenges
Why quality signals dominate system behavior.
Task dependence and metric mismatch.
Evaluator heterogeneity and "directionality" risk.
Takeaway.
A Multi-Dimensional Quality Scoring Framework
Design Goals and Principles
...and 56 more sections

Figures (14)

Figure 1: Overview of the proposed multi-dimensional quality scoring framework and its integration into Proof of Quality (PoQ) for decentralized LLM inference. Candidate outputs are scored by multiple dimension modules and combined into a composite quality signal that can be used for consensus and rewards.
Figure 2: Modular architecture of multi-dimensional quality scoring. Each dimension module produces a normalized score; the composite score $\hat{s}(q,y)$ is then used as a PoQ-compatible quality signal for aggregation and incentives.
Figure 3: Unified correlation summary across individual evaluators, consensus methods, and the default composite score.
Figure 4: Correlation heatmap (GT) across evaluators, consensus baselines, composite, and dimensions.
Figure 5: Per-dimension correlation with GT. Semantic quality is strongly aligned overall, while alignment and agreement dimensions can be negatively correlated without calibration.
...and 9 more figures

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

TL;DR

Abstract

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Authors

TL;DR

Abstract

Table of Contents

Figures (14)