Information Theory

arXiv:cs.IT

Covers theoretical and experimental aspects of information theory and coding. Includes source coding, channel coding, data compression, and cryptographic protocols.

Trending in Information Theory

A Dual Approach for Hierarchical Information-Theoretic Tree Abstractions

In this paper, we consider establishing a formal connection between two distinct tree-abstraction problems inspired by the information-bottleneck (IB) method. Specifically, we consider the hard- and soft-constrained formulations that have recently appeared in the literature to determine the conditions for which the two approaches are equivalent. Our analysis leverages concepts from Lagrangian relaxation and duality theory to relate the dual function of the hard-constrained problem to the Q-function employed in Q-tree search and shows the connection between tree phase transitions and solutions to the dual problem obtained by exploiting the problem structure. An algorithm is proposed that employs knowledge of the tree phase transitions to find a setting of the dual variable that solves the dual problem. Furthermore, we present an alternative approach to select the dual variable that leverages the integer programming formulation of the hard-constrained problem and the strong duality of linear programming. To obtain a linear program, we establish that a relaxation of the integer programming formulation of the hard-constrained tree-search problem has the integrality property by showing that the program constraint matrix is totally unimodular. Empirical results that corroborate the theoretical developments are presented and discussed throughout.

2512.01985

Dec 2025Information Theory

Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs

Large language models (LLMs) have demonstrated remarkable capabilities in numerous real-world applications. While the vast majority of research conducted from an experimental perspective is progressing rapidly, it demands substantial computational power, data, and other resources. Therefore, how to open the black-box of LLMs from a theoretical standpoint has become a critical challenge. This paper takes the theory of rate-distortion function, directed information, and Granger causality as its starting point to investigate the information-theoretic principles behind LLMs, leading to the development of semantic information theory for LLMs, where the fundamental unit is token, rather than bits that lacks any semantic meaning. By defining the probabilistic model of LLMs, we discuss structure-agnostic information-theoretic measures, such as the directed rate-distortion function in pre-training, the directed rate-reward function in post-training, and the semantic information flow in inference phase. This paper also delves deeply into the theory of token-level semantic embedding and the information-theoretically optimal vectorization method. Thereafter, we propose a general definition of autoregression LLM, where the Transformer architecture and its performance such as ELBO, generalization error bound, memory capacity, and semantic information measures can be derived theoretically. Other architectures, such as Mamba/Mamba2 and LLaDA, are also discussed in our framework. Consequently, this paper provides a theoretical framework for understanding LLMs from the perspective of semantic information theory, which also offers the necessary theoretical tools for further in-depth research.

2511.012021

Nov 2025Information Theory

Related categories:

Information Theory Networking and Internet Architecture Cryptography and Security Statistical Learning Machine Learning

788 papers

2601.04193

A discrete Benamou-Brenier formulation of Optimal Transport on graphs

We propose a discrete transport equation on graphs which connects distributions on both vertices and edges. We then derive a discrete analogue of the Benamou-Brenier formulation for Wasserstein-$1$ distance on a graph and as a result classify all $W_1$ geodesics on graphs.

2601.04193Jan 2026

View

Expectation Propagation for Distributed Inference in Grant-Free Cell-Free Massive MIMO

Grant-free cell-free massive multiple-input multiple-output (GF-CF-MaMIMO) systems are anticipated to be a key enabling technology for next-generation Internet-of-Things (IoT) networks, as they support massive connectivity without explicit scheduling. However, the large amount of connected devices prevents the use of orthogonal pilot sequences, resulting in severe pilot contamination (PC) that degrades channel estimation and data detection performance. Furthermore, scalable GF-CF-MaMIMO networks inherently rely on distributed signal processing. In this work, we consider the uplink of a GF-CF-MaMIMO system and propose two novel distributed algorithms for joint activity detection, channel estimation, and data detection (JACD) based on expectation propagation (EP). The first algorithm, denoted as JACD-EP, uses Gaussian approximations for the channel variables, whereas the second, referred to as JACD-EP-BG, models them as Bernoulli-Gaussian (BG) random variables. To integrate the BG distribution into the EP framework, we derive its exponential family representation and develop the two algorithms as efficient message passing over a factor graph constructed from the a posteriori probability (APP) distribution. The proposed framework is inherently scalable with respect to both the number of access points (APs) and user equipments (UEs). Simulation results show the efficient mitigation of PC by the proposed distributed algorithms and their superior detection accuracy compared to (genie-aided) centralized linear detectors.

2601.04166Jan 2026

View

2601.04041

Serving Every Symbol: All-Symbol PIR and Batch Codes

A $t$-all-symbol PIR code and a $t$-all-symbol batch code of dimension $k$ consist of $n$ servers storing linear combinations of $k$ linearly independent information symbols with the following recovery property: any symbol stored by a server can be recovered from $t$ pairwise disjoint subsets of servers. In the batch setting, we further require that any multiset of size $t$ of stored symbols can be recovered from $t$ disjoint subsets of servers. This framework unifies and extends several well-known code families, including one-step majority-logic decodable codes, (functional) PIR codes, and (functional) batch codes. In this paper, we determine the minimum code length for some small values of $k$ and $t$, characterize structural properties of codes attaining this optimum, and derive bounds that show the trade-offs between length, dimension, minimum distance, and $t$. In addition, we study MDS codes and the simplex code, demonstrating how these classical families fit within our framework, and establish new cases of an open conjecture from \cite{YAAKOBI2020} concerning the minimal $t$ for which the simplex code is a $t$-functional batch code.

2601.04041Jan 2026

View

Flexible-Duplex Cell-Free Architecture for Secure Uplink Communications in Low-Altitude Wireless Networks

Low-altitude wireless networks (LAWNs) are expected to play a central role in future 6G infrastructures, yet uplink transmissions of uncrewed aerial vehicles (UAVs) remain vulnerable to eavesdropping due to their limited transmit power, constrained antenna resources, and highly exposed air-ground propagation conditions. To address this fundamental bottleneck, we propose a flexible-duplex cell-free (CF) architecture in which each distributed access point (AP) can dynamically operate either as a receive AP for UAV uplink collection or as a transmit AP that generates cooperative artificial noise (AN) for secrecy enhancement. Such AP-level duplex flexibility introduces an additional spatial degree of freedom that enables distributed and adaptive protection against wiretapping in LAWNs. Building upon this architecture, we formulate a max-min secrecy-rate problem that jointly optimizes AP mode selection, receive combining, and AN covariance design. This tightly coupled and nonconvex optimization is tackled by first deriving the optimal receive combiners in closed form, followed by developing a penalty dual decomposition (PDD) algorithm with guaranteed convergence to a stationary solution. To further reduce computational burden, we propose a low-complexity sequential scheme that determines AP modes via a heuristic metric and then updates the AN covariance matrices through closed-form iterations embedded in the PDD framework. Simulation results show that the proposed flexible-duplex architecture yields substantial secrecy-rate gains over CF systems with fixed AP roles. The joint optimization method attains the highest secrecy performance, while the low-complexity approach achieves over 90% of the optimal performance with an order-of-magnitude lower computational complexity, offering a practical solution for secure uplink communications in LAWNs.

2601.04011Jan 2026

View

2601.03982

Unique Decoding of Hyperderivative Reed-Solomon Codes

Error-correcting codes are combinatorial objects designed to cope with the problem of reliable transmission of information on a noisy channel. A fundamental problem in coding theory and practice is to efficiently decode the received word with errors to obtain the transmitted codeword. In this paper, we consider the decoding problem of Hyperderivative Reed-Solomon (HRS) codes with respect to the NRT metric. Specifically, we propose a Welch-Berlekamp algorithm for the unique decoding of NRT HRS codes.

2601.03982Jan 2026

View

Low-Complexity Planar Beyond-Diagonal RIS Architecture Design Using Graph Theory

Reconfigurable intelligent surfaces (RISs) enable programmable control of the wireless propagation environment and are key enablers for future networks. Beyond-diagonal RIS (BD-RIS) architectures enhance conventional RIS by interconnecting elements through tunable impedance components, offering greater flexibility with higher circuit complexity. However, excessive interconnections between BD-RIS elements require multi-layer printed circuit board (PCB) designs, increasing fabrication difficulty. In this letter, we use graph theory to characterize the BD-RIS architectures that can be realized on double-layer PCBs, denoted as planar-connected RISs. Among the possible planar-connected RISs, we identify the ones with the most degrees of freedom, expected to achieve the best performance under practical constraints.

2601.03831Jan 2026

View

2601.03492

Hermitian LCD $2$-Quasi Abelian Codes over Finite Chain Rings

This paper introduces a class of Hermitian LCD $2$-quasi-abelian codes over finite fields and presents a comprehensive enumeration of these codes in which relative minimum weights are small. We show that such codes are asymptotically good over finite fields. Furthermore, we extend our analysis to finite chain rings by characterizing $2$-quasi-abelian codes in this setting and proving the existence of asymptotically good Hermitian LCD $2$-quasi-abelian codes over finite chain rings as well.

2601.03492Jan 2026

View

2601.03489

LCPs of Subspace Codes

A subspace code is a nonempty collection of subspaces of the vector space $\mathbb{F}_q^{n}$. A pair of linear codes is called a linear complementary pair (in short LCP) of codes if their intersection is trivial and the sum of their dimensions equals the dimension of the ambient space. Equivalently, the two codes form an LCP if the direct sum of these two codes is equal to the entire space. In this paper, we introduce the concept of LCPs of subspace codes. We first provide a characterization of subspace codes that form an LCP. Furthermore, we present a sufficient condition for the existence of an LCP of subspace codes based on a complement function on a subspace code. In addition, we give several constructions of LCPs for subspace codes using various techniques and provide an application to insertion error correction.

2601.03489Jan 2026

View

On the Capacity Region of Individual Key Rates in Vector Linear Secure Aggregation

We provide new insights into an open problem recently posed by Yuan-Sun [ISIT 2025], concerning the minimum individual key rate required in the vector linear secure aggregation problem. Consider a distributed system with $K$ users, where each user $k\in [K]$ holds a data stream $W_k$ and an individual key $Z_k$. A server aims to compute a linear function $\mathbf{F}[W_1;\ldots;W_K]$ without learning any information about another linear function $\mathbf{G}[W_1;\ldots;W_K]$, where $[W_1;\ldots;W_K]$ denotes the row stack of $W_1,\ldots,W_K$. The open problem is to determine the minimum required length of $Z_k$, denoted as $R_k$, $k\in [K]$. In this paper, we characterize a new achievable region for the rate tuple $(R_1,\ldots,R_K)$. The region is polyhedral, with vertices characterized by a binary rate assignment $(R_1,\ldots,R_K) = (\mathbf{1}(1 \in \mathcal{I}),\ldots,\mathbf{1}(K\in \mathcal{I}))$, where $\mathcal{I}\subseteq [K]$ satisfies the \textit{rank-increment condition}: $\mathrm{rank}\left(\bigl[\mathbf{F}_{\mathcal{I}};\mathbf{G}_{\mathcal{I}}\bigr]\right) =\mathrm{rank}\bigl(\mathbf{F}_{\mathcal{I}}\bigr)+N$. Here, $\mathbf{F}_\mathcal{I}$ and $\mathbf{G}_\mathcal{I}$ are the submatrices formed by the columns indexed by $\mathcal{I}$. Our results uncover the novel fact that it is not necessary for every user to hold a key, thereby strictly enlarging the best-known achievable region in the literature. Furthermore, we provide a converse analysis to demonstrate its optimality when minimizing the number of users that hold keys.

2601.03241Jan 2026

View

2601.03165

On the Euclidean duals of the cyclic codes generated by cyclotomic polynomials

In this article, we determine the minimum distance of the Euclidean dual of the cyclic code $\mathcal{C}_n$ generated by the $n$th cyclotomic polynomial $Q_n(x)$ over $\mathbb{F}_q$, for every positive integer $n$ co-prime to $q$. In particular, we prove that the minimum distance of $\mathcal{C}_{n}^{\perp}$ is a function of $n$, namely $2^{ω(n)}$. This was precisely the conjecture posed by us in \cite{BHAGAT2025}.

2601.03165Jan 2026

View

2601.03126

Dualities for finite abelian groups and applications to coding theory

The choice of an isomorphism, a duality, between a finite abelian group $A$ and its character group allows one to define dual codes of additive codes over $A$. Properties of dualities and dual codes are studied, continuing work of Delsarte from 1973 and more recent work of Dougherty and his collaborators.

2601.03126Jan 2026

View

Context-aware Privacy Bounds for Linear Queries

Linear queries, as the basis of broad analysis tasks, are often released through privacy mechanisms based on differential privacy (DP), the most popular framework for privacy protection. However, DP adopts a context-free definition that operates independently of the data-generating distribution. In this paper, we revisit the privacy analysis of the Laplace mechanism through the lens of pointwise maximal leakage (PML). We demonstrate that the distribution-agnostic definition of the DP framework often mandates excessive noise. To address this, we incorporate an assumption about the prior distribution by lower-bounding the probability of any single record belonging to any specific class. With this assumption, we derive a tight, context-aware leakage bound for general linear queries, and prove that our derived bound is strictly tighter than the standard DP guarantee and converges to the DP guarantee as this probability lower bound approaches zero. Numerical evaluations demonstrate that by exploiting this prior knowledge, the required noise scale can be reduced while maintaining privacy guarantees.

2601.02855Jan 2026

View

State-Dependent Fading Gaussian Channel with Common Reconstruction Constraints

The task of jointly communicating a message and reconstructing a common estimate of the channel state is examined for a fading Gaussian model with additive state interference. The state is an independent and identically distributed Gaussian sequence known noncausally at the transmitter, and the instantaneous fading coefficient is perfectly known at both the transmitter and the receiver. The receiver is required to decode the transmitted message and, in addition, reconstruct the state under a common reconstruction constraint ensuring that its estimate coincides with that at the transmitter. A complete characterization of the optimal rate distortion tradeoff region for this setting is the main result of our work. The analytical results are also validated through numerical examples illustrating the rate distortion and power distortion tradeoffs.

2601.02802Jan 2026

View

2601.02608

Weights on finite fields and failures of the MacWilliams identities

In the 1960s, MacWilliams proved that the Hamming weight enumerator of a linear code over a finite field completely determines, and is determined by, the Hamming weight enumerator of its dual code. In particular, if two linear codes have the same Hamming weight enumerator, then their dual codes have the same Hamming weight enumerator. In contrast, there is a wide class of weights on finite fields whose weight enumerators have the opposite behavior: there exist two linear codes having the same weight enumerator, but their dual codes have different weight enumerators.

2601.02608Jan 2026

View

Improved decoding algorithms for surface codes under independent bit-flip and phase-flip errors

We study exact decoding for the toric code and for planar and rotated surface codes under the standard independent $X/Z$ noise model, focusing on Separate Minimum Weight (SMW) decoding and Separate Most Likely Coset (SMLC) decoding. For the SMW decoding problem, we show that an $O(n^{3/2}\log n)$-time decoder is achievable for surface and toric codes, improving over the $O(n^{3}\log n)$ worst-case time of the standard approach based on complete decoding graphs. Our approach is based on a local reduction of SMW decoding to the minimum weight perfect matching problem using Fisher gadgets, which preserves planarity for planar and rotated surface codes and genus~$1$ for the toric code. This reduction enables the use of Lipton--Tarjan planar separator methods and implies that SMW decoding lies in $\mathrm{NC}$. For SMLC decoding, we show that the planar surface code admits an exact decoder with $O(n^{3/2})$ algebraic complexity and that the problem lies in $\mathrm{NC}$, improving over the $O(n^{2})$ algebraic complexity of Bravyi \emph{et al.} Our approach proceeds via a dual-cycle formulation of coset probabilities and an explicit reduction to planar Pfaffian evaluation using Fisher--Kasteleyn--Temperley constructions. The same complexity measures apply to SMLC decoding of the rotated surface code. For the toric code, we obtain an exact polynomial-time SMLC decoder with $O(n^{3})$ algebraic complexity. In addition, while the SMLC formulation is motivated by connections to statistical mechanics, we provide a purely algebraic derivation of the underlying duality based on MacWilliams duality and Fourier analysis. Finally, we discuss extensions of the framework to the depolarizing noise model and identify resulting open problems.

2601.00972Jan 2026

View

CoCo-Fed: A Unified Framework for Memory- and Communication-Efficient Federated Learning at the Wireless Edge

The deployment of large-scale neural networks within the Open Radio Access Network (O-RAN) architecture is pivotal for enabling native edge intelligence. However, this paradigm faces two critical bottlenecks: the prohibitive memory footprint required for local training on resource-constrained gNBs, and the saturation of bandwidth-limited backhaul links during the global aggregation of high-dimensional model updates. To address these challenges, we propose CoCo-Fed, a novel Compression and Combination-based Federated learning framework that unifies local memory efficiency and global communication reduction. Locally, CoCo-Fed breaks the memory wall by performing a double-dimension down-projection of gradients, adapting the optimizer to operate on low-rank structures without introducing additional inference parameters/latency. Globally, we introduce a transmission protocol based on orthogonal subspace superposition, where layer-wise updates are projected and superimposed into a single consolidated matrix per gNB, drastically reducing the backhaul traffic. Beyond empirical designs, we establish a rigorous theoretical foundation, proving the convergence of CoCo-Fed even under unsupervised learning conditions suitable for wireless sensing tasks. Extensive simulations on an angle-of-arrival estimation task demonstrate that CoCo-Fed significantly outperforms state-of-the-art baselines in both memory and communication efficiency while maintaining robust convergence under non-IID settings.

2601.00549Jan 2026

View

Random Multiplexing

As wireless communication applications evolve from traditional multipath environments to high-mobility scenarios like unmanned aerial vehicles, multiplexing techniques have advanced accordingly. Traditional single-carrier frequency-domain equalization (SC-FDE) and orthogonal frequency-division multiplexing (OFDM) have given way to emerging orthogonal time-frequency space (OTFS) and affine frequency-division multiplexing (AFDM). These approaches exploit specific channel structures to diagonalize or sparsify the effective channel, thereby enabling low-complexity detection. However, their reliance on these structures significantly limits their robustness in dynamic, real-world environments. To address these challenges, this paper studies a random multiplexing technique that is decoupled from the physical channels, enabling its application to arbitrary norm-bounded and spectrally convergent channel matrices. Random multiplexing achieves statistical fading-channel ergodicity for transmitted signals by constructing an equivalent input-isotropic channel matrix in the random transform domain. It guarantees the asymptotic replica MAP bit-error rate (BER) optimality of AMP-type detectors for linear systems with arbitrary norm-bounded, spectrally convergent channel matrices and signaling configurations, under the unique fixed point assumption. A low-complexity cross-domain memory AMP (CD-MAMP) detector is considered, leveraging the sparsity of the time-domain channel and the randomness of the equivalent channel. Optimal power allocations are derived to minimize the replica MAP BER and maximize the replica constrained capacity of random multiplexing systems. The optimal coding principle and replica constrained-capacity optimality of CD-MAMP detector are investigated for random multiplexing systems. Additionally, the versatility of random multiplexing in diverse wireless applications is explored.

2512.24087Dec 2025

View

Learning to Reconfigure: Using Device Status to Select the Right Constrained Coding Scheme

In the age of data revolution, a modern storage~or transmission system typically requires different levels of protection. For example, the coding technique used to fortify data in a modern storage system when the device is fresh cannot be the same as that used when the device ages. Therefore, providing reconfigurable coding schemes and devising an effective way to perform this reconfiguration are key to extending the device lifetime. We focus on constrained coding schemes for the emerging two-dimensional magnetic recording (TDMR) technology. Recently, we have designed efficient lexicographically-ordered constrained (LOCO) coding schemes for various stages of the TDMR device lifetime, focusing on the elimination of isolation patterns, and demonstrated remarkable gains by using them. LOCO codes are naturally reconfigurable, and we exploit this feature in our work. Reconfiguration based on predetermined time stamps, which is what the industry adopts, neglects the actual device status. Instead, we propose offline and online learning methods to perform this task based on the device status. In offline learning, training data is assumed to be available throughout the time span of interest, while in online learning, we only use training data at specific time intervals to make consequential decisions. We fit the training data to polynomial equations that give the bit error rate in terms of TD density, then design an optimization problem in order to reach the optimal reconfiguration decisions to switch from a coding scheme to another. The objective is to maximize the storage capacity and/or minimize the decoding complexity. The problem reduces to a linear programming problem. We show that our solution is the global optimal based on problem characteristics, and we offer various experimental results that demonstrate the effectiveness of our approach in TDMR systems.

2512.21396Dec 2025

View

Breaking Rank -- A Novel Unscented Kalman Filter for Parameter Estimations of a Lumped-Parameter Cardiovascular Model

We make modifications to the unscented Kalman filter (UKF) which bestow almost complete practical identifiability upon a lumped-parameter cardiovascular model with 10 parameters and 4 output observables - a highly non-linear, stiff problem of clinical significance. The modifications overcome the challenging problems of rank deficiency when applying the UKF to parameter estimation. Rank deficiency usually means only a small subset of parameters can be estimated. Traditionally, pragmatic compromises are made, such as selecting an optimal subset of parameters for estimation and fixing non-influential parameters. Kalman filters are typically used for dynamical state tracking, to facilitate the control u at every time step. However, for the purpose of parameter estimation, this constraint no longer applies. Our modification has transformed the utility of UKF for the parameter estimation purpose, including minimally influential parameters, with excellent robustness (i.e., under severe noise corruption, challenging patho-physiology, and no prior knowledge of parameter distributions). The modified UKF algorithm is robust in recovering almost all parameters to over 98% accuracy, over 90% of the time, with a challenging target data set of 50, 10-parameter samples. We compare this to the original implementation of the UKF algorithm for parameter estimation and demonstrate a significant improvement.

2601.02390Dec 2025

View

Learned Digital Codes for Over-the-Air Computation in Federated Edge Learning

Federated edge learning (FEEL) enables wireless devices to collaboratively train a centralised model without sharing raw data, but repeated uplink transmission of model updates makes communication the dominant bottleneck. Over-the-air (OTA) aggregation alleviates this by exploiting the superposition property of the wireless channel, enabling simultaneous transmission and merging communication with computation. Digital OTA schemes extend this principle by incorporating the robustness of conventional digital communication, but current designs remain limited in low signal-to-noise ratio (SNR) regimes. This work proposes a learned digital OTA framework that improves recovery accuracy, convergence behaviour, and robustness to challenging SNR conditions while maintaining the same uplink overhead as state-of-the-art methods. The design integrates an unsourced random access (URA) codebook with vector quantisation and AMP-DA-Net, an unrolled approximate message passing (AMP)-style decoder trained end-to-end with the digital codebook and parameter server local training statistics. The proposed design extends OTA aggregation beyond averaging to a broad class of symmetric functions, including trimmed means and majority-based rules. Experiments on highly heterogeneous device datasets and varying numbers of active devices show that the proposed design extends reliable digital OTA operation by more than 10 dB into low SNR regimes while matching or improving performance across the full SNR range. The learned decoder remains effective under message corruption and nonlinear aggregation, highlighting the broader potential of end-to-end learned design for digital OTA communication in FEEL.

2512.19777Dec 2025

View

Page 1 of 40