Probability theory, stochastic processes, and mathematical statistics
Elephant random walks were studied recently in \cite{mukherjee2025elephant} on the groups $\mathbb{Z}^{*d_1} * \mathbb{Z}_2^{*d_2}$ whose Cayley graphs are infinite $d$-regular trees with $d = 2d_1 + d_2$. It was found that for $d \ge 3$, the elephant walk is ballistic with the same asymptotic speed $\frac{d - 2}{d}$ as the simple random walk and the memory parameter appears only in the rate of convergence to the limiting speed. In the $d = 2$ case, there are two such groups, both having the bi-infinite path as their Cayley graph. For $(d_1, d_2) = (1, 0)$, the walk is the usual elephant random walk on $\mathbb{Z}$, which exhibits anomalous diffusion. In this article, we study the other case, namely $(d_1, d_2) = (0, 2)$, which corresponds to the infinite dihedral group $D_\infty \cong \mathbb{Z}_2 * \mathbb{Z}_2$. Unlike the classical ERW on $\mathbb{Z}$, which is a time-inhomogeneous Markov chain, the ERW on $D_{\infty}$ is non-Markovian. We show that the first and second order behaviours of the \emph{signed location} of the walker agree with those of the simple symmetric random walk on $\mathbb{Z}$, with the memory parameter essentially manifesting itself via a lower order correction term that can be written as an explicit functional of the elephant walk on $\mathbb{Z}$. Our result demonstrates that unlike the simple random walk, the elephant walk is sensitive to local algebraic relations. Indeed, although $D_{\infty}$ is virtually abelian, containing $\mathbb{Z}$ as a finite-index subgroup, the involutive nature of its generators effectively neutralises memory, thereby ruling out any potential superdiffusive behaviour, in contrast to the superdiffusion observed on its abelian cousin $\mathbb{Z}$.
2604.04882We give counterexamples to a problem of M. Kac in the Scottish Book, which asks whether a certain nonlinear operation on two characteristic functions characterizes Laplace distributions, in analogy with the Cramér--Lévy theorem for Gaussian distributions. We then give an affirmative answer to a refined version of the problem. Finally, we develop a general framework for such characterization problems, construct generalized counterexamples, and pose some open questions.
2604.04848In this addendum we extend Theorem 4.6 on the negative binomial distribution in `Bounds for survival probabilities in supercritical Galton-Watson processes and applications to population genetics' (Journal of Mathematical Biology 92:40, 2026; arXiv:2503.21403). We prove that the fractional linear lower bound to the negative binomial generating function derived there is indeed valid for every $x\in[0,1]$, and not only for $x\in[0,P^\infty_{\rm NB}]$, where $P^\infty_{\rm NB}$ is the extinction probability of the associated Galton-Watson process.
This paper derives a lower bound on the spacing between adjacent zeros of the confluent hypergeometric function $Φ(a,b,z)$ when $a$ is variable and $(b,z) \in \mathbb{R}^+$ are known and fixed. Monotonicity of the bound is established, and the results are used to assess the accuracy of asymptotic approximations for the first passage probability of a Wiener process.
2604.04823We study the spectral gaps of parallel and simulated tempering chains targeting multimodal Gibbs measures. In particular, we consider chains constructed from Metropolis random walks that preserve the Gibbs distributions at a sequence of harmonically spaced temperatures. We prove that their spectral gaps admit polynomial lower bounds of order $11$ and $12$ in terms of the low target temperature. The analysis applies to a broad class of potentials, beyond mixture models, without requiring explicit structural information on the energy landscape. The main idea is to decompose the state space and construct a Lyapunov function based on a suitably perturbed potential, which allows us to establish lower bounds on the local spectral gaps.
2604.04785We study bootstrap inference for the $k$th largest coordinate of a normalized sum of independent high-dimensional random vectors. Existing second-order theory for maxima does not directly extend to order statistics, because the event $\{T_{n,[k]}\le t\}$ is not a rectangle and its local structure is governed by exceedance counts rather than by a single boundary. We develop an approach based on factorial moments and weighted inclusion--exclusion that reduces the problem to a collection of rare-orthant probabilities and allows high-dimensional Edgeworth and Cornish--Fisher expansions to be transferred to the order-statistic setting. Under moment, variance, and weak-dependence conditions, we derive a second-order coverage expansion for wild-bootstrap critical values of the $k$th order statistic. In particular, a third-moment matching wild bootstrap achieves coverage error of order $n^{-1}$ up to logarithmic factors, and the same second-order accuracy is obtained for a prepivoted double wild bootstrap. We also show that the maximal-correlation condition can be replaced by a stationary Gaussian exponential-mixing assumption at the price of an explicit dependence remainder $r_d$, and this remainder can itself be of order $n^{-1}$ when the dimension is sufficiently large relative to the sample size. These results extend recent second-order Gaussian and bootstrap approximation theory from maxima to the $k$th order statistic in high dimension.
2604.04747We study driven-dissipative activated random walk with sleep probability $p$ on an $n$-vertex complete graph with a sink that traps jumping particles with probability $q_n$. We show that the number of sleeping particles $S_n$ left by the stationary distribution has a Gumbel scaling limit for $\exp(-n^{1/3}) \ll q_n \ll n^{-1/2}$. This implies that the stationary configuration law is not a product measure. We also prove that $S_n/n$ converges to $p$ if and only if $q_n = e^{-o(n)}$, and that, when $q_n=0$, the number of jumps to stabilization undergoes a phase transition at density $p$.
Bayesian neural networks (BNNs) offer a natural probabilistic formulation for inference in deep learning models. Despite their popularity, their optimality has received limited attention through the lens of statistical decision theory. In this paper, we study decision rules induced by deep, fully connected feedforward ReLU BNNs in the normal location model under quadratic loss. We show that, for fixed prior scales, the induced Bayes decision rule is not minimax. We then propose a hyperprior on the effective output variance of the BNN prior that yields a superharmonic square-root marginal density, establishing that the resulting decision rule is simultaneously admissible and minimax. We further extend these results from the quadratic loss setting to the predictive density estimation problem with Kullback--Leibler loss. Finally, we validate our theoretical findings numerically through simulation.
We study the topological structure of random geometric forests $G$ in the Euclidean plane under mild assumptions: non-crossing edges, stationarity, and finite edge intensity. The framework covers a broad range of constructions, including models based on stationary point processes as well as lattices, and encompasses many already well-studied examples among drainage networks, geodesic forests arising from first- and last-passage percolation, and minimal or uniform spanning trees. First, denoting by $N_k$ the number of $k$-ended connected components in $G$ for each $k\geq0$, we show that almost surely, all trees of $G$ have at most two topological ends, $N_0\in\{0,\infty\}$, $N_1\leq2$, and $N_1=2\implies N_2<\infty$. We then construct explicit examples realizing all possibilities compatible with these constraints, yielding a complete classification of the admissible topological structures for $G$. As a second result, we prove that under the additional assumptions that $G$ is non-empty, oriented, out-degree one, with all its directed paths going to infinity along a fixed deterministic direction, the situation reduces to a dichotomy: $G$ consists almost surely of either a unique one-ended tree, or infinitely many two-ended trees. Our proofs combine classical Burton-Keane type arguments with substantial new conceptual ideas using planar topology, resulting in a robust, unified approach.
2604.04661We consider polynomial Bergman kernels with respect to exponentially varying weights $e^{-n \mathscr Q(z)}$ depending on a potential $\mathscr Q:\mathbb C^d\to\mathbb R$. We use these kernels to construct determinantal point processes on $\mathbb C^d$. Under mild conditions on the potential, the points are known to accumulate on a compact set $S_{\mathscr Q}$ called the droplet. We show that the local behavior of the kernel in the vicinity of the edge $\partial S_{\mathscr Q}$ is described in two different ways by universal limiting kernels. One of these limiting kernels is the error-function kernel, which is ubiquitous in random matrix theory, while the other limiting kernel is a new universal object: a multivariate version of the error-function kernel. We prove the universality in two qualitatively different settings: (i) the tensorized case where $\mathscr Q$ decomposes as a sum of planar potentials, and (ii) the case where $\mathscr Q$ is rotational symmetric. We also explicitly identify the subspace of the Bargmann-Fock space where the multivariate error-function kernel is reproducing. To treat regular edge points that exhibit a certain type of bulk degeneracy, we also find the behavior of the planar kernel with number of terms of order $o(n)$ instead of $n$. Lastly, we prove an edge scaling limit for counting statistics.
In this paper, we study estimation of parameters in a two-parameter Potts model with $q$ colors and coupling matrix $A_N$. We characterize concrete sufficient conditions for existence of the pseudo-likelihood estimator of the Potts model, in terms of the local magnetic fields, and give sufficient conditions for the validity of the above characterization. We then provide sufficient criteria for estimation of both parameters at the optimal rate $\sqrt{N}$. In particular, if $A_N$ is the scaled adjacency matrix of a graph $G_N$, then we show that joint estimation is possible if either $G_N$ has bounded degree or is irregular. In contrast, we give an example of a graph sequence $G_N$ which is approximately regular and dense, where no consistent estimator exists. We also show that one-parameter estimation at the optimal rate $\sqrt{N}$ holds under much milder conditions when the other parameter is known. Along the way, we develop a concentration result for mean-field Potts models using the framework of nonlinear large deviations. Compared to the Ising case, our results for the Potts case require a novel analysis across multiple colors.
This paper studies the problem of recovering a hidden vertex correspondence between two correlated graphs when both edge weights and node features are observed. While most existing work on graph alignment relies primarily on edge information, many real-world applications provide informative node features in addition to graph topology. To capture this setting, we introduce the featured correlated Gaussian Wigner model, where two graphs are coupled through an unknown vertex permutation, and the node features are correlated under the same permutation. We characterize the optimal information-theoretic thresholds for exact recovery and partial recovery of the latent mapping. On the algorithmic side, we propose QPAlign, an algorithm based on a quadratic programming relaxation, and demonstrate its strong empirical performance on both synthetic and real datasets. Moreover, we also derive theoretical guarantees for the proposed procedure, supporting its reliability and providing convergence guarantees.
We develop a rigorous theoretical framework for principal manifold estimation that recovers a latent low-dimensional manifold from a point cloud observed in a high-dimensional ambient space. Our framework accommodates manifolds with general, potentially non-Euclidean topology, which can be inferred using tools from topological data analysis. Using the theory of Sobolev spaces on Riemannian manifolds, we establish that the proposed principal manifolds are well defined, prove convergence of the iterative algorithm used to compute them, and show consistency of the finite-sample estimator. Furthermore, we introduce a novel method for selecting the complexity level of a fitted manifold, which addresses the shortcomings of the classical fitting-error criterion. We also provide a detailed geometric interpretation of the penalty term in our framework. In addition to the theoretical developments, we present extensive numerical experiments supporting our results. This article provides theoretical foundations for approaches that have been used in applications such as robotics. More importantly, it extends these approaches to general topological settings with potential applications across a broad range of disciplines, including neuroimaging and shape data analysis.
Consider $n$ real/complex, independent/dependent random variables with respective tail bounds and $g$ a measurable function of the r.v.'s. Consider $f$ the "sharpest" tail bound of $g$ (sharpest in the sense that if $f$ were any less, then for some $X_1,...,X_n$ satisfying the conditions, $g(X_1,...,X_n)$ would not satisfy $f$). Significant research has been done to approximate $f$ often with high accuracy. These results are often of the form that for $g$ in this family and tail bounds of $X_k$ in this family, $f$ is bounded by some $f'$ with high accuracy. However, the question "what would it take to find $f$ exactly?" has received little attention, apparently even for simple cases. This is the question we try to answer. For $X_1,...,X_n$ required to be mutually independent, first the $X_k$ are simplified to be monotone on $(0,1)$ WLOG. This strengthens convergence in distribution to convergence a.e. (Skorokhod's representation theorem) and allows defining shift operators, which help reduce the space of r.v.'s one searches to find $f$ and/or the maximum measure of a subset. We do find $f$ in some special cases; however $f$ rarely has a closed form. For $X_1,...,X_n$ dependent/not necessarily independent, another reduction in the space of r.v.'s one searches to find $f$ is done.
We study robust regression under a contamination model in which covariates are clean while the responses may be corrupted in an adaptive manner. Unlike the classical Huber's contamination model, where both covariates and responses may be contaminated and consistent estimation is impossible when the contamination proportion is a non-vanishing constant, it turns out that the clean-covariate setting admits strictly improved statistical guarantees. Specifically, we show that the additional information in the clean covariates can be carefully exploited to construct an estimator that achieves a better estimation rate than that attainable under Huber contamination. In contrast to the Huber model, this improved rate implies consistency even when the contamination is a constant. A matching minimax lower bound is established using Fano's inequality together with the construction of contamination processes that match $m> 2$ distributions simultaneously, extending the previous two-point lower bound argument in Huber's setting. Despite the improvement over the Huber model from an information-theoretic perspective, we provide formal evidence -- in the form of Statistical Query and Low-Degree Polynomial lower bounds -- that the problem exhibits strong information-computation gaps. Our results strongly suggest that the information-theoretic improvements cannot be achieved by polynomial-time algorithms, revealing a fundamental gap between information-theoretic and computational limits in robust regression with clean covariates.
2604.04118We consider causal discovery in structural causal models driven by heavy-tailed noise, where extremes carry important information about causal direction. We introduce the Heavy-Tailed Homogeneous Structural Causal Model (HT-HSCM), a unified framework that generalizes heavy-tailed linear and max-linear models. We demonstrate that causal tail coefficients identify the complete ancestral partial order of the underlying directed acyclic graph. We also formulate a recursive algorithm for recovering quantities associated with the model called ancestral impulse-responses from the causal tail coefficients. Our results provide a general and theoretically justified framework for causal discovery in heavy-tailed systems.
We study higher-order small-noise fluctuation expansions for the overdamped Langevin dynamics in a quartic double-well potential. Assuming that the initial data admits a suitable expansion structure, we obtain a strong dynamical expansion of the trajectories, as well as an expansion of the laws with respect to smooth observables. We then investigate the long-time behavior of the expansion coefficients. In the scalar case $d=1$, each coefficient converges exponentially fast to a finite limit as $t\to\infty$. In contrast, for $d\ge 2$, the fluctuation expansion coefficients reflect the degeneracy of the manifold of minima, which in general prevents the existence of a finite long-time limit. Furthermore, by combining a multi-level induction with combinatorial arguments, we derive a recursive formula for the fluctuation expansion coefficients. This recursion shows that the long-time limits of these dynamical expansion coefficients coincide with those arising from the corresponding equilibrium expansions.
Many economic models feature monotone Markov dynamics on state spaces that may be noncompact. Establishing existence, uniqueness, and stability of stationary distributions in such settings has required a patchwork of sufficient conditions, each tailored to specific applications. We provide a single necessary and sufficient condition: a monotone Markov process has a globally stable stationary distribution if and only if it is asymptotically contractive and has a tight trajectory. This characterization covers both compact and noncompact state spaces, discrete and continuous time, and extends to nonlinear Markov operators that depend on aggregate state. We demonstrate the result through applications to wage dynamics, Bayesian learning with belief shocks, and income processes that generate Pareto tails.
2604.03882A homogenization principle for total variation We prove an inequality comparing the variational distance between pairs of product probability measures to its homogenized counterpart. If $P_1,\ldots,P_n,Q_1,\ldots,Q_n$ are arbitrary probability measures on a measurable space and $\bar P:=\frac1n\sum_{i=1}^n P_i, \bar Q:=\frac1n\sum_{i=1}^n Q_i $, we show that $$TV\!\left(\bigotimes_{i=1}^n P_i, \bigotimes_{i=1}^n Q_i\right) \;\ge\; c\,TV(\bar P^{\otimes n},\bar Q^{\otimes n}),$$ where $c>0$ is a universal constant. The proof is based on a one-dimensional representation of total variation between products. We embed pairs of probability distributions $P_i,Q_i$ into positive measures $η_i$ on $\mathbb{R}$. We then define a functional $T$ over measures on $\mathbb{R}$ that realizes TV over products via convolution: $TV\!\left(\bigotimes_{i=1}^n P_i, \bigotimes_{i=1}^n Q_i\right)=T(η_1*\cdots *η_n)$. Our main analytic discovery is that for the relevant class of positive measures $η_i$, the convolution inequality $T(η_1*\cdots*η_n) \ge c\,T\!\left(\barη^{*n}\right)$ holds, where $\barη=\frac1n\sum_{i=1}^n η_i$. Finally, a higher-dimensional lifting argument shows that $T\!\left(\barη^{*n}\right)\ge TV(\bar P^{\otimes n},\bar Q^{\otimes n})$. To our knowledge, both the exact representation and the convolution inequality are new.
In the present paper, we study the equilibrium fluctuations of a particle system in infinite volume with two conserved quantities and long-range dependence. More specifically, the model of interest is the so-called ABC model, in which three types of particles (A, B and C) exchange their locations between $x\in\mathbb{Z}$ and $x+z\in\mathbb{Z}$ at a rate that depends on the type of particles involved and is proportional to $|z|^{-γ-1}$ for $γ>0$. After rigorously identifying the normal modes associated to the conserved quantities (the density of particles of types $A$ and $B$, say), we prove that their fluctuations converge to independent fractional stochastic partial differential equations (SPDEs), which are either Gaussian or the Stochastic Burgers equation, and whose nature is determined by the microscopic range of dependence and the strength of the asymmetry.