Gaussian Cooling and Dikin Walks: The Interior-Point Method for Logconcave Sampling

Yunbum Kook; Santosh S. Vempala

Gaussian Cooling and Dikin Walks: The Interior-Point Method for Logconcave Sampling

Yunbum Kook, Santosh S. Vempala

TL;DR

This work generalizes the Interior-Point Method for convex optimization based on self-concordant barriers by developing and adapting IPM machinery together with the Dikin walk for poly-time sampling algorithms and illustrates the approach on important special cases.

Abstract

The connections between (convex) optimization and (logconcave) sampling have been considerably enriched in the past decade with many conceptual and mathematical analogies. For instance, the Langevin algorithm can be viewed as a sampling analogue of gradient descent and has condition-number-dependent guarantees on its performance. In the early 1990s, Nesterov and Nemirovski developed the Interior-Point Method (IPM) for convex optimization based on self-concordant barriers, providing efficient algorithms for structured convex optimization, often faster than the general method. This raises the following question: can we develop an analogous IPM for structured sampling problems? In 2012, Kannan and Narayanan proposed the Dikin walk for uniformly sampling polytopes, and an improved analysis was given in 2020 by Laddha-Lee-Vempala. The Dikin walk uses a local metric defined by a self-concordant barrier for linear constraints. Here we generalize this approach by developing and adapting IPM machinery together with the Dikin walk for poly-time sampling algorithms. Our IPM-based sampling framework provides an efficient warm start and goes beyond uniform distributions and linear constraints. We illustrate the approach on important special cases, in particular giving the fastest algorithms to sample uniform, exponential, or Gaussian distributions on a truncated PSD cone. The framework is general and can be applied to other sampling algorithms.

Gaussian Cooling and Dikin Walks: The Interior-Point Method for Logconcave Sampling

TL;DR

Abstract

Paper Structure (12 sections, 5 theorems, 11 equations, 3 figures, 1 table)

This paper contains 12 sections, 5 theorems, 11 equations, 3 figures, 1 table.

Introduction
Warm-up: Dikin walk and self-concordance
Dikin walk.
Dikin metrics and self-concordance.
Results
Dikin walk ($\S$\ref{['sec:mixing-Dikin']})
Sampling IPM: Gaussian cooling with the Dikin walk ($\mathsf{GCDW}$) ($\S$\ref{['sec:IPM-framework']})
Derivation of the algorithm.
Self-concordance theory for combining barriers ($\S$\ref{['sec:sc-theory-rules']})
Metrics for well-known structured instances ($\S$\ref{['sec:handbook-barrier']})
(1) Linear constraints.
(2) Quadratic potentials and constraints.

Key Result

Theorem 0

Let $K\subset\mathbb{R}^{d}$ be convex and $0\leq\alpha\leq\beta<\infty$. Then for any $\varepsilon>0$, it holds that $d_{\textrm{TV}}(\pi_{0}P^{(T)},\pi)\leq\varepsilon$ for $T\gtrsim d\,\max(1,\beta)\,\min(\bar{\nu},1/\alpha)\,\log\frac{{\|\pi_{0}/\pi\|}}{\varepsilon}$.

Figures (3)

Figure 1.1: Iterates of the $\mathsf{Dikin\ walk}$ (Algorithm \ref{['alg:DikinWalk']}). Solid lines centered at $X_{i}$ indicate Dikin ellipsoids, $\mathcal{D}_{g}^{r}(X_{i})$.
Figure 1.2: (a) Self-concordance of barrier/metric (Definition \ref{['def:sc']}) ensures that the Hessian (so Dikin ellipsoids) changes smoothly. (b) $\bar{\nu}$-symmetry (Definition \ref{['def:symm-param']}) indicates how well a Dikin ellipsoid $\mathcal{D}_{g}^{r}(X)$ approximates the locally symmetrized convex body, $K\cap(2X-K)$.
Figure 1.3: Outline

Theorems & Definitions (7)

Definition 1.1: Self-concordance (brief version of Definition \ref{['def:sc']})
Definition 1.2: $\bar{\nu}$-symmetry
Theorem 0
Theorem 0
Theorem 0
Theorem : Linear constraints
Theorem 1.3: [Quadratic] Let $K_{1}=\{x\in\mathbb{R}^{d}:\frac{1}{2} x^{\mathsf{T}}Qx+p^{\mathsf{T}}x+l\leq0\}$ with $p\in\mathbb{R}^{d}$ and $0\neq Q\in\mathbb{S}_{+}^{d}$. Let $K_{2}=\{(x,t)\in\mathbb{R}^{d+1}:\frac{1}{2}{\|x-\mu\|}_{\Sigma}^{2}\leq t\}$ and $K_{3}=\{(x,t)\in\mathbb{R}^{d+1}:{\|x-\mu\|}_{\Sigma}\leq t\}$ with $\mu\in\mathbb{R}^{d}$ and $\Sigma\in\mathbb{S}_{++}^{d}$. Let $x\in\textup{int}(K_{i})$ and $h\in\mathbb{R}^{\dim(K_{i})}$. Ellipsoid $\phi_{\textup{ellip}}(x)=-\log(-l-p^{\mathsf{T}}x-\frac{1}{2} x^{\mathsf{T}}Qx)$ for $K_{1}$: $g=d\,\nabla^{2}\phi_{\textup{ellip}}$ satisfies $\nu,\bar{\nu}=\mathcal{O}(d)$, SSC when $Q\in\mathbb{S}_{++}^{d}$, $\mathrm{D}^{2}g(x)[h,h]\succeq0$ (so SLTSC), and SASC.Gaussian $\phi_{\textup{Gauss}}(x,t)=-\log(t-\frac{1}{2}{\|x-\mu\|}_{\Sigma}^{2})$ for $K_{2}$: $g=d\,\nabla^{2}\phi_{\textup{Gauss}}$ satisfies $\nu,\bar{\nu}=\mathcal{O}(d)$, SSC, and $\mathrm{D}^{2}g(x,t)[h,h]\succeq0$ (so SLTSC), and SASC.Second-order cone $\phi_{\textup{SOC}}(x,t)=-\log(t^{2}-{\|x-\mu\|}_{\Sigma}^{2})$ for $K_{3}$: $g=d\,\nabla^{2}\phi_{\textup{SOC}}$ satisfies $\nu,\bar{\nu}=\mathcal{O}(d)$, SSC, SLTSC, and SASC. Another fundamental constraint is the PSD cone. This convex region admits a $d$-self-concordant barrier $\phi_{\textup{PSD}}(\cdot)=-\log\det(\cdot)$. We show that it satisfies SLTSC, while the $d$-scaling further guarantees SSC and ASC. In establishing ASC, we find an interesting connection to the Gaussian orthogonal ensemble (GOE), one of the main objects studied in random matrix theory. However, we cannot prove SASC, so we need the $\frac{d(d+1)}{2}$-scaling for SASC (due to HSC of $\phi_{\textup{PSD}}$). Let $K=\mathbb{S}_{+}^{d}$, $X\in\textup{int}(K)$, and $H\in\mathbb{S}^{d}$. Then, $d\,\nabla^{2}\phi_{\textup{PSD}}$ satisfies $\nu,\bar{\nu}=\mathcal{O}(d^{2})$, SSC, $\mathrm{D}^{2}g(X)[H,H]\succeq0$ (so SLTSC), and ASC. $\frac{d(d+1)}{2}\,\nabla^{2}\phi_{\textup{PSD}}$ is SASC. It is sometime more convenient to introduce $d$ many new variables as seen in the following: Let $K_{1}=\prod_{i=1}^{d}\{(x_{i},t_{i})\in\mathbb{R}^{2}:x_{i}\geq0,\,t_{i}\geq x_{i}\log x_{i}\}$ and $K_{2}=\prod_{i=1}^{d}\{(x_{i},t_{i})\in\mathbb{R}^{2}:\left\lvert x_{i}\right\rvert ^{p}\leq t_{i}\}$. Entropy $\phi_{\textup{ent}}(x,t)=-\sum_{i=1}^{d}{\bigl(\log(t_{i}-x_{i}\log x_{i})+36\log x_{i}\bigr)}$ for $K_{1}$: $g=d\,\nabla^{2}\phi_{\textup{ent}}$ satisfies $\nu,\bar{\nu}=\mathcal{O}(d^{2})$, SSC, SLTSC, and SASC.The $p$-th power of $\ell_{p}$-norm $\phi_{\textup{power}}(x,t)=-\sum_{i=1}^{d}{\bigl(\log(t_{i}^{2/p}-x_{i}^{2})+72\log t_{i}\bigr)}$ for $K_{2}$: $g=d\,\nabla^{2}\phi$ satisfies $\nu,\bar{\nu}=\mathcal{O}(d^{2})$, SSC, SLTSC, and SASC. Our theory (Theorem \ref{['thm:Dikin-annealing']} and \ref{['thm:IPM-sampling']}) with the study of barriers (Table \ref{['tab:scaling-table']}) proposes local metrics for structured instances. $\mathsf{GCDW}$ with them mixes in poly-time faster than the $\mathsf{Ball\ walk}$. For fair comparison, the complexity of the $\mathsf{Ball\ walk}$ refers to that of isotropic rounding (see §\ref{['sec:examples']}). Let us introduce a variable for each of ${\|X-B\|}_{F}$ and ${\|X-C\|}_{F}^{2}$. Then our theory suggests the following barrier: $4(\phi_{\textup{log}}+d^{2}\phi_{\textup{Gaussian}}+d^{2}\phi_{\textup{SOC}}+d^{2}\phi_{\textup{PSD}})$, which is $\mathcal{O}(1)\,(m+d^{3},m+d^{3})$-self-concordant, SSC, LTSC, and ASC. By Theorem \ref{['thm:Dikin-annealing']} with $\alpha=0$ and $\beta=1$ (due to $\phi_{\textup{PSD}}$ in the potential), we need $\widetilde{\mathcal{O}}{\bigl(d^{2}(m+d^{3})\bigr)}$ iterations of the $\mathsf{Dikin\ walk}$ in total. Let us first consider uniform sampling over linear constraints given by $Ax\geq b$ for $A\in\mathbb{R}^{m\times d}$ and $b\in\mathbb{R}^{m}$. Recall that for uniform sampling the $\mathsf{Ball\ walk}$ mixes in $\widetilde{\mathcal{O}}(d^{3})$ iterations (including isotropic rounding). On the other hand, $\widetilde{\mathcal{O}}(md)$ queries are enough for $\mathsf{GCDW}$ with the $(m,m)$-Dikin amenable metric induced by $\phi_{\textup{log}}$. This recovers the mixing time of kannan2012random without warmness. If we use the $(\sqrt{md},\sqrt{md})$-Dikin-amenable Vaidya or $(d^{3/2},d^{3/2})$-Dikin-amenable Lewis-weight metric instead, then $\mathsf{GCDW}$ with each metric recovers the $\widetilde{\mathcal{O}}(m^{1/2}d^{3/2})$ and $\widetilde{\mathcal{O}}(d^{5/2})$ mixing of the $\mathsf{Vaidya\ walk}$ and $\mathsf{Approximate\ John\ walk}$ chen2018fast without warmness. For a second-order cone with linear constraints, we can use the Hessian of $2(\phi_{\textup{log}}+d\phi_{\textup{SOC}})$ that is $(m+d,m+d)$-Dikin-amenable, with which $\mathsf{GCDW}$ mixes in $\widetilde{\mathcal{O}}(d\,(m+d))$ iterations in total. Lastly, for the PSD cone with linear constraints, we can use the $(m+d^{3},m+d^{3})$-Dikin-amenable $2\nabla^{2}(\phi_{\textup{log}}+d^{2}\phi_{\textup{PSD}})$. $\mathsf{GCDW}$ with this needs $\widetilde{\mathcal{O}}(d^{2}(m+d^{3}))$ queries. For large $m$, we use the $(d^{3},d^{3})$-Dikin-amenable $2(dg_{\textup{Lw}}+d^{2}\nabla^{2}\phi_{\textup{PSD}})$, with which $\mathsf{GCDW}$ mixes in $\widetilde{\mathcal{O}}(d^{5})$ iterations. In the same setting, the $\mathsf{Ball\ walk}$ needs $\widetilde{\mathcal{O}}(d^{6})$ queries. For exponential sampling, $\mathsf{GCDW}$ requires the same number of iterations of the $\mathsf{Dikin\ walk}$ for each case (i.e., polytope, second-order cone, PSD), while the $\mathsf{Ball\ walk}$ needs $\widetilde{\mathcal{O}}(d^{4})$ iterations for the polytope and second-order cone, and $\widetilde{\mathcal{O}}(d^{8})$ iterations for the PSD cone. Detailed statements on the mixing times and efficient per-step implementation can be found in §\ref{['subsec:PSD-cone-sampling']}. narayanan2016randomized went beyond linear constraints and analyzed the $\mathsf{Dikin\ walk}$ for uniform sampling over a convex region given as the intersection of (1) linear constraints, (2) a hyperbolic cone with a $\nu_{h}$-SC hyperbolic barrier $\phi_{h}$, and (3) a general convex set with a $\nu_{s}$-SC barrier $\phi_{s}$. Using $\nabla^{2}(\phi_{\textup{log}}+d\phi_{h}+d^{2}\phi_{s})$ as a local metric, this work shows that the $\mathsf{Dikin\ walk}$ mixes in $\mathcal{O}{\bigl(d{\bigl(m+d\nu_{h}+(d\nu_{s})^{2}\bigr)}\bigr)}$ steps from a warm start. The term $d(d\nu_{s})^{2}$ induced by self-concordance alone is typically the largest one in the provable guarantee. Interesting results of this work arise when $K$ is the intersection of (1) and (2). Since a hyperbolic barrier is HSC guler1997hyperbolic, the $d$-scaling of a HSC barrier makes it SSC, SLTSC, and SASC. Also, as a $\nu_{h}$-SC hyperbolic barrier is $\mathcal{O}(\nu_{h})$-symmetric (implied in guler1997hyperbolic), it follows that $d\phi_{h}$ is $(d\nu_{h},d\nu_{h})$-Dikin-amenable. Hence, $\phi_{\log}+d\phi_{h}$ induces an $(m+d\nu_{h},m+d\nu_{h})$-Dikin-amenable metric, and the $\mathsf{Dikin\ walk}$ with this metric mixes in $\mathcal{O}(d\,(m+d\nu_{h}))$ iterations from a warm start by Theorem \ref{['thm:Dikin']}. Without warmness, narayanan2016randomized showed that the $\mathsf{Dikin\ walk}$ started at $x\in K$, where $s\geq|p|/|q|$ for any chord $\overline{pq}$ of $K$ passing through $x$, mixes in $\mathcal{O}{\bigl(d(m+d\nu_{h}){\bigl[d\log{\bigl(s(m+d\nu_{h})\bigr)}+\log\frac{1}{\varepsilon}\bigr]}\bigr)}$ steps. On the other hand, $\mathsf{GCDW}$ requires only $\mathcal{O}{\bigl(d(m+d\nu_{h})\log\frac{d(m+d\nu_{h})}{\varepsilon}\bigr)}$ iterations. Going forward, we consider only logarithmic barriers for linear constraints. The $\mathsf{Ball\ walk}$ for general log-concave distributions mixes in $\widetilde{\mathcal{O}}(d^{4})$ iterations. As per our reduction, we first replace a quadratic potential (coming from the Gaussian distribution) by a new variable, adding its epigraph to a constraint. For a polytope, one can use the $(m+d,m+d)$-Dikin-amenable $2\nabla^{2}(\phi_{\textup{log}}+d\phi_{\textup{Gauss}})$, so $\mathsf{GCDW}$ needs $\widetilde{\mathcal{O}}(d\,(m+d))$ iterations of the $\mathsf{Dikin\ walk}$. For the second-order cone with linear constraints, $\mathsf{GCDW}$ with the $(m+d,m+d)$-Dikin-amenable metric $3\nabla^{2}(\phi_{\textup{log}}+d\phi_{\textup{SOC}}+d\phi_{\textup{Gauss}})$ requires $\widetilde{\mathcal{O}}(d\,(m+d))$ iterations. For the PSD cone with linear constraints, $\mathsf{GCDW}$ with the $(m+d^{3},m+d^{3})$-Dikin-amenable metric $3\nabla^{2}(\phi_{\textup{log}}+d^{2}\phi_{\textup{PSD}}+d^{2}\phi_{\textup{Gauss}})$ mixes in $\widetilde{\mathcal{O}}(d^{2}(m+d^{3}))$ iterations. The $\mathsf{Ball\ walk}$ is much slower, requiring $\widetilde{\mathcal{O}}(d^{8})$ iterations. For a polytope, we use the $(m+d^{2},m+d^{2})$-Dikin-amenable $2\nabla^{2}(\phi_{\textup{log}}+d\phi_{\textup{ent}})$ in $2d$-dimensional space. Thus, $\mathsf{GCDW}$ needs $\widetilde{\mathcal{O}}(d\,(m+d^{2}))$ iterations of the $\mathsf{Dikin\ walk}$. For the second-order cone with linear constraints, $\mathsf{GCDW}$ with the $(m+d^{2},m+d^{2})$-Dikin-amenable $3\nabla^{2}(\phi_{\textup{log}}+d\phi_{\textup{SOC}}+d\phi_{\textup{ent}})$, requires in $\widetilde{\mathcal{O}}(d\,(m+d^{2}))$ iterations. Lastly, for the PSD cone with linear constraints, $\mathsf{GCDW}$ with the $(m+d^{4},m+d^{4})$-Dikin-amenable $3\nabla^{2}(\phi_{\textup{log}}+d^{2}\phi_{\textup{PSD}}+d^{2}\phi_{\textup{ent}})$ mixes in $\widetilde{\mathcal{O}}(d^{2}(m+d^{4}))$ iterations. The $\mathsf{Ball\ walk}$ mixes in $\widetilde{\mathcal{O}}(d^{8})$ iterations in this setting. The inner loop of the sampling IPM samples from a distribution whose potential is of the form $c^{\mathsf{T}}x+\alpha\phi(x)$. Thus, the study of other non-Euclidean samplers for relatively convex and smooth potentials will be interesting future work. Next, one question unanswered is if the $d^{2}$-scaling of $\phi_{\textup{PSD}}$ can be improved, which is mathematically interesting in its own right. The $d$-scaling for ASC is shown through the random matrix theory, which is challenging to extend to SASC (see Remark \ref{['rem:challenge-extension-SASC']}). Our problem \ref{['eq:problem']} is a special case of logconcave sampling: sample from a distribution $\pi$ with density proportional to $\exp(-V)$ for a convex function $V$ on $\mathbb{R}^{d}$. This problem has spawned a long line of research in several communities, as it captures various important distributions, including uniform distributions over convex bodies and Gaussians. A large body of recent work in machine learning and statistics makes the assumption of $0\prec\alpha I\preceq\nabla^{2} V\preceq\beta I$ on $\mathbb{R}^{d}$ (i.e., $\alpha$-strong convexity and $\beta$-smoothness of the potential $V$), where the strong-convexity assumption is sometimes relaxed to isoperimetry assumptions such as log-Sobolev inequalities (LSI), Poincaré inequality (PI), and Cheeger isoperimetry. See chewi2023log for a survey on this topic. The guarantees provided on the mixing time of samplers under this assumption have polynomial dependence on the condition number defined as $\beta/\alpha$ (or $\alpha$ is replaced by the isoperimetric constant). These guarantees do not apply to constrained sampling. For example, in uniform sampling, the simplest constrained sampling problem, $V$ is set to be a constant within the convex body and infinity outside the body, which leads to discontinuity of $V$ and $\beta=\infty$. The sudden change of $V$ around the boundary requires special consideration, such as small step size, use of a Metropolis filter, projection, etc., making it a more challenging problem. Uniform sampling can be accomplished through the $\mathsf{Ball\ walk}$ (lovasz1993randomkannan1997random) and $\mathsf{Hit\text{-}and\text{-}Run}$ (smith1984efficient), both of which only require access to a function proportional to the density. When a convex body $K\subset\mathbb{R}^{d}$ satisfies $B_{r}(x_{0})\subset K\subset B_{R}(x_{0})$ for some $x_{0}$, the $\mathsf{Ball\ walk}$ mixes in $\widetilde{\mathcal{O}}{\bigl(d^{2}(R/r)^{2}\bigr)}$ steps from warm start (kannan1997random) and $\textsf{Hit-and-Run}$ mixes in $\widetilde{\mathcal{O}}{\bigl(d^{2}(R/r)^{2}\bigr)}$ steps from any start (lovasz1999hitlovasz2006hit). lovasz2007geometry further extended these results to general logconcave distributions. These algorithms need to use a "step size" of $\Omega(1/\sqrt{d})$, and their mixing is affected by the skewed geometry of the convex body (i.e., when $R/r\gg1$). The latter can be addressed by first rounding the body, after which the $\mathsf{Ball\ walk}$ and the $\textsf{Hit-and-Run}$ mix in $\widetilde{\mathcal{O}}(d^{2})$ steps from a warm start, due to bounds on the KLS constant by chen2021almostklartag2023logarithmic and stochastic localization by chen2022hit. The fastest rounding algorithm by jia2021reducing requires $\widetilde{\mathcal{O}}(d^{3})$ queries to a membership oracle, and uses the $\mathsf{Ball\ walk}$. The $\mathsf{Ball\ walk}$ uses the same radius ball for every point in the convex body. One might want to use a different radius depending on the distance to the boundary. This by itself does not work as it simply makes the current point converge to the boundary. However, replacing balls with ellipsoids whose shape changes based on the proximity to the boundary does work. Several sampling algorithms are motivated by the use of local metrics: the $\mathsf{Dikin\ walk}$ (kannan2012random), $\mathsf{Riemannian\ Hamiltonian\ Monte\ Carlo}$ (RHMC), $\mathsf{Riemannian\ Langevin\ algorithm}$ (girolami2011riemann), etc. Which local metrics would be suitable candidates? It turns out that a suitable metric can be derived from self-concordant barriers, a concept dating back to the development of the interior-point method in convex-optimization literature (nesterov1994interior). It is well-known that any convex body admits an $d$-self-concordant barrier such as universal barrier (nesterov1994interiorlee2021universal) and entropic barrier (bubeck2014entropicchewi2021entropic), but these are computationally expensive. Moreover, as noted in laddha2020strong, the symmetry parameter of these general barriers is $\Omega(d^{2})$ for $d$-dimensional bodies (even for second-order cones), and so the resulting complexity for the $\mathsf{Dikin\ walk}$ on a PSD cone is $\Omega(d^{2}\cdot d^{4})=\Omega(d^{6})$. Thus, there is a need to find barriers that are more closely aligned with the structure of sets we wish to sample. Samplers such as the $\mathsf{Ball\ walk}$ and $\mathsf{Hit\text{-}and\text{-}Run}$ can be used to sample polytopes, but they do not really use any special properties of polytopes. For polytopes with $m$ linear constraints in $d$-dimension ($m>d$), the first theoretical result via self-concordant barriers dates back to kannan2012random which proposed the $\mathsf{Dikin\ walk}$ with the $m$-self-concordant logarithmic barrier and established the mixing rate of $\widetilde{\mathcal{O}}(md)$ for uniform sampling. chen2018fast revisited the idea of vaidya1996new using the $\mathcal{O}(\sqrt{md})$-self-concordant hybrid barrier, which is a hybrid of the volumetric barrier and the log barrier and leads to a faster interior-point method. They presented the $\mathsf{Dikin\ walk}$ with the hybrid barrier giving an $\widetilde{\mathcal{O}}(\sqrt{m}d^{3/2})$-mixing guarantee. Lastly, laddha2020strong proposed the $\mathsf{Dikin\ walk}$ with a variant of the $\mathcal{O}^{*}(d)$-self-concordant LS barrier based on Lewis weights, developed by lee2019solving, and showed a mixing rate of $\widetilde{\mathcal{O}}(d^{2})$. While the next point proposed by all these Markov chains is obtained by a Euclidean straight line step, the $\mathsf{Geodesic\ walk}$ and RHMC use curves (geodesics and Hamiltonian-preserving curves respectively). lee2017geodesic and lee2018convergence showed that for uniform sampling, the $\mathsf{Geodesic\ walk}$ and RHMC with the log barrier mix in $\widetilde{\mathcal{O}}(md^{3/4})$ and $\widetilde{\mathcal{O}}(md^{2/3})$ steps respectively. kook2022condition extended theoretical analysis of RHMC to truncated exponential distributions and showed that discretization of Hamilton's equations by practical numerical integrators maintains a fast mixing rate. gatmiry2023sampling showed that just as the $\mathsf{Dikin\ walk}$ enjoys faster mixing via a barrier with a better self-concordance parameter, RHMC with a hybrid barrier consisting of the Lewis weights and log barrier mixes in $\widetilde{\mathcal{O}}(m^{1/3}d^{4/3})$ steps. Their proof is based on developing suitable properties and algorithmic bounds for Riemannian manifolds. Extending these non-Euclidean methods to general domains (e.g., $\mathbb{S}_{+}^{d}$) and to more general densities (e.g., Gaussian, relatively strong convex and smooth) to potentially improve the complexity of the problem significantly beyond the bounds that follow from general convex body sampling, have been open research directions and motivate our paper. narayanan2016randomized explored the first direction, analyzing the $\mathsf{Dikin\ walk}$ for uniform sampling over the intersection of linear constraints, a hyperbolic cone with a hyperbolic barrier, and a general convex set with a SC barrier. Our current understanding of the second direction is rather limited. A line of work has focused on the analysis of first-order non-Euclidean samplers, such as discretized $\mathsf{Mirror\ Langevin\ algorithm}$ (MLA) or $\mathsf{Riemannian\ Langevin\ algorithm}$ (RLA) but under strong assumptions. For example, li2022mirror provided mixing-rate guarantees of MLA under the modified self-concordance of $\phi$ in the setting $\alpha\nabla^{2}\phi\preceq\nabla^{2} f\preceq\beta\nabla^{2}\phi$. However, the modified self-concordance is not affine-invariant, so it does not correctly capture affine-invariance of the algorithm. ahn2021efficientgatmiry2022convergence avoid the modified self-concordance, analyzing MLA and RLA under an alternative discretization scheme that requires an exact simulation of the Brownian motion $\nabla^{2}\phi(X_{t})^{-1/2}\,\mathrm{d} W_{t}$ which is not known to be achievable algorithmically. gopi2023algorithmic proposed a non-Euclidean version of the proximal sampler based on the log-Laplace transformation (LLT) and analyzed its mixing when a potential is strongly convex and Lipschitz (not smooth) relatively in $\nabla^{2}\phi$. However, the LLT has no closed form in general. Recently, srinivasan2023fast analyzed the Metropolis-adjusted MLA under the relative Lipschitzness of the potential (i.e., ${\|\nabla f\|}_{[\nabla^{2}\phi]^{-1}}<\infty$) in addition to the relative convex and smoothness. Our study of the $\mathsf{Dikin\ walk}$ for general cones and general densities provides a rather complete picture of zeroth-order non-Euclidean samplers. It also provides a general framework and improved bounds as well as a "handbook" for structured sampling. For $n\in\mathbb{N}$, let $[n]:=\{1,\cdots,n\}$. We use $f\lesssim g$ to denote $f\leq cg$ for some universal constant $c>0$. The $\widetilde{\mathcal{O}}$ complexity notation suppresses poly-logarithmic factors and dependence on error parameters. For $a,b\in\mathbb{R}^{d}$, we denote $a\wedge b:=\min(a,b)$ and $a\vee b:=\max(a,b)$. For $v\in\mathbb{R}^{d}$, the Euclidean norm (or $\ell_{2}$-norm) is denoted by ${\|v\|}_{2}\stackrel{\mathrm{{ def}}}{=}\sqrt{\sum_{i\in[d]}v_{i}^{2}}$, and the infinity norm is denoted by ${\|v\|}_{\infty}\stackrel{\mathrm{{ def}}}{=}\max_{i\in[d]}|v_{i}|$. A Gaussian distribution with mean $\mu\in\mathbb{R}^{d}$ and covariance $\Sigma\in\mathbb{R}^{d\times d}$ is denoted by $\mathcal{N}(\mu,\Sigma)$. We use $\mathbb{S}^{d}$ to denote the set of symmetric matrices of size $d\times d$. For $X\in\mathbb{S}^{d}$, we call it positive semidefinite (PSD) (resp. positive definite (PD)) if $h^{\mathsf{T}}Xh\geq0$ ($>0)$ for any $h\in\mathbb{R}^{d}$. We use $\mathbb{S}_{+}^{d}$ to denote the set of positive definite matrices of size $d\times d$. Note that their effective dimension is $d_{s}:=d(d+1)/2$ due to symmetry. For a positive (semi) definite matrix $X$, its square root is denoted as $X^{\frac{1}{2}}$, and is the unique positive (semi) definite matrix satisfying $X^{\frac{1}{2}}X^{\frac{1}{2}}=X$. For $A,B\in\mathbb{S}^{d}$, we use $A\preceq B$ ($A\prec B$) to indicate that $B-A$ is PSD (PD). For a matrix $A\in\mathbb{R}^{d\times d}$, its trace is denoted by $\textup{Tr}(A)=\sum_{i=1}^{d}A_{ii}$. The operator norm and Frobenius norm are denoted by ${\|A\|}_{2}\stackrel{\mathrm{{ def}}}{=}\sup_{x\in\mathbb{R}^{d}}{\|Ax\|}_{2}/{\|x\|}_{2}$ and ${\|A\|}_{F}\stackrel{\mathrm{{ def}}}{=}{\bigl(\sum_{i,j=1}^{d}A_{ij}^{2}\bigr)}^{1/2}=\sqrt{\textup{Tr}(A^{\mathsf{T}}A)}$, respectively. For $X\in\mathbb{S}^{d}$, its vectorization $\textup{vec}{(}X)\in\mathbb{R}^{d^{2}}$ is obtained by stacking each column of $X$ vertically. Its symmetric vectorization $\textup{svec}(X)\in\mathbb{R}^{d_{s}}$ is obtained by stacking the lower triangular part in vertical direction. For a matrix $A\in\mathbb{R}^{d\times d}$ and vector $x\in\mathbb{R}^{d}$, we use $\textsf{diag}(A)$ to denote the vector in $\mathbb{R}^{d}$ with $[\textsf{diag}(A)]_{i}=A_{ii}$ for $i\in[d]$, $\textup{Diag}(A)$ to denote the diagonal matrix with $[\textup{Diag}(A)]_{ii}=A_{ii}$ for $i\in[d]$ and $\textup{Diag}(x)$ to denote the diagonal matrix in $\mathbb{R}^{d\times d}$ with $[\textup{Diag}(x)]_{ii}=x_{i}$ for $i\in[d]$. For matrices $A,B\in\mathbb{R}^{d\times d}$, their inner product is defined as the inner product of $\textup{vec}{(}A)$ and $\textup{vec}{(}B)$, denoted by $\langle A,B\rangle=\textup{Tr}(A^{\mathsf{T}}B)$. Their Hadamard product $A\circ B$ is the matrix of size $d\times d$ defined by $(A\circ B)_{ij}=A_{ij}B_{ij}$ (i.e., obtained by element-wise multiplication). For $A\in\mathbb{R}^{p\times q}$ and $B\in\mathbb{R}^{r\times s}$, their Kronecker product $A\otimes B$ is the matrix of size $pr\times qs$ defined by $A\otimes B=\left[A_{11}B\cdotsA_{1q}B\vdots\vdotsA_{p1}B\cdotsA_{pq}B\right]\,,$ where $A_{ij}B$ is the matrix of size $r\times s$ obtained by multiplying each entry of $B$ by the scalar $A_{ij}$. For a full-rank matrix $A\in\mathbb{R}^{m\times d}$ with $m\geq d$, we recall that $P(A):=A(A^{\mathsf{T}}A)^{-1}A^{\mathsf{T}}$ is the orthogonal projection matrix onto the column space of $A$. The leverage scores of $A$ is denoted by $\sigma(A):=\textsf{diag}{\bigl(P(A)\bigr)}\in\mathbb{R}^{m}$. We let $\Sigma(A):=\textup{Diag}{\bigl(\sigma(A)\bigr)}=\textup{Diag}{\bigl(P(A)\bigr)}$ and $P^{(2)}(A):=P(A)\circ P(A)$. The $\ell_{p}$-Lewis weights of $A$ is denoted by $w(A)$, the solution $w$ to the equation $w(A)=\textsf{diag}{\bigl(W^{1/2-1/p}A(A^{\mathsf{T}}W^{1-2/p}A)^{-1}A^{\mathsf{T}}W^{1/2-1/p}\bigr)}\in\mathbb{R}^{m}$ for $W=\textup{Diag}(w)$. When $m<d$ or $A$ is not full rank, both leverage scores and Lewis weights can be generalized via the Moore-Penrose inverse in place of the inverse in the definitions. For a function $f:\mathbb{R}^{d}\to\mathbb{R}$, let $\nabla f(x)\in\mathbb{R}^{d}$ denote the gradient of $f$ at $x$ (i.e., $[\nabla f(x)]_{i}=\frac{\partial f}{\partial x_{i}}(x)$) and $\nabla^{2} f(x)\in\mathbb{R}^{d\times d}$ denote the Hessian of $f$ at $x$ (i.e., $[\nabla^{2} f(x)]_{ij}=\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}(x)$). For a matrix function $g:\mathbb{R}^{d}\to\mathbb{R}^{d\times d}$ in $x$, we use $\mathrm{D} g$ and $\mathrm{D}^{2}g$ to denote the third-order and fourth-order tensor defined by $[\mathrm{D} g(x)]_{ijk}=\frac{\partial[g(x)]_{ij}}{\partial x_{k}}$ and $[\mathrm{D}^{2}g(x)]_{ijkl}=\frac{\partial^{2}[g(x)]_{ij}}{\partial x_{k}\partial x_{l}}$. We use the following shorthand notation: $g_{x,h}':=\mathrm{D} g(x)[h]$ and $g_{x,h}":=\mathrm{D}^{2}g(x)[h,h]$, where $\mathrm{D}^{i}g(x)[h_{1},\dotsc,h_{i}]=\mathrm{D}^{i}g(x)[h_{1}\otimes\cdots\otimes h_{i}]$ denote the $i$-th directional derivative of $g$ at $x$ in directions $h_{1},\dotsc,h_{i}\in\mathbb{R}^{d}$, i.e., $\mathrm{D}^{i}g(x)[h_{1},\dotsc,h_{i}]=\frac{\mathrm{d}^{i}}{\mathrm{d} t_{1}\cdots\mathrm{d} t_{i}}g{\Bigl(x+\sum_{j=1}^{i}t_{j}h_{j}\Bigr)}|_{t_{1},\dotsc,t_{i}=0}\,.$ At each point $x$ in a set $K\subset\mathbb{R}^{d}$, a local metric $g$, denoted as $g_{x}$ or $g(x)$, is a positive-definite inner product $g_{x}:\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R}$, which induces the local norm as ${\|v\|}_{g(x)}:=\sqrt{g_{x}(v,v)}$. We use ${\|v\|}_{x}$ to refer to ${\|v\|}_{g(x)}$ when the context is clear. When an ambient space has an orthonormal basis as in our setting (e.g., $\{e_{1},\dots,e_{d}\}$), the local metric $g_{x}$ can be represented as a positive-definite matrix of size $d\times d$. In this case, we abuse notation by using $g(x)$ to indicate the $d\times d$ positive-definite matrix represented with respect to such an orthonormal basis. Also, the inner product can be written as $g_{x}(v,w)=v^{\mathsf{T}}g(x)w$. Going forward, we use $g_{x}=g(x)$ to denote a local metric (or positive definite matrix of size $\dim(x)\times\dim(x)$) at each point $x\in K$. The local metric $g$ is assumed to be at least twice differentiable. We use the same symbol for a distribution and its density with respect to the Lebesgue measure. Many sampling algorithms are based on Markov chains. A transition kernel $P:\mathbb{R}^{d}\times\mathcal{B}(\mathbb{R}^{d})\to\mathbb{R}_{\geq0}$ (or one-step distribution) for the Borel $\sigma$-algebra $\mathcal{B}(\mathbb{R}^{d})$ quantifies the probability of the Markov chains transitioning from one point to another measurable set. The next-step distribution is defined by $P_{x}(A):=P(x,A)$, which is the probability of a step from $x$ landing in the set $A$. The transition kernel characterizes the Markov chain in the sense that if a current distribution is $\mu$, then the distribution after $n$ steps can be expressed as $\mu P^{(n)}$, where $\mu P^{(i)}(x):=\int_{\mathbb{R}^{d}}P(\cdot,x)\,\mu P^{(i-1)}$ is defined recursively for $i\in[n]$ with the convention $\mu P^{(0)}=\mu$. We call $\pi$ a stationary distribution of the Markov chain if $\pi=\pi P$. If the stationary distribution further satisfies $\int_{A}P(x,B)\,\pi(\mathrm{d} x)=\int_{B}P(x,A)\,\pi(\mathrm{d} x)$ for any two measurable subsets $A,B$, then the Markov chain is said to be reversible with respect to $\pi$. It is expected that the Markov chain approaches the stationary distribution. We measure this with the total variation distance (TV-distance): for two distributions $\mu$ and $\pi$ on $\mathbb{R}^{d}$, the TV-distance is defined as $d_{\textrm{TV}}(\mu,\pi)\stackrel{\mathrm{{ def}}}{=}\sup_{A\in\mathcal{B}(\mathbb{R}^{d})}|\mu(A)-\pi(A)|=\frac{1}{2}{ \int}_{\mathbb{R}^{d}}|\frac{\mathrm{d}\mu}{\mathrm{d} x}-\frac{\mathrm{d}\pi}{\mathrm{d} x}|\,\mathrm{d} x$, where the last equality holds when the two distributions admit densities with respect to the Lebesgue measure on $\mathbb{R}^{d}$. We also recall other probabilistic distances: when $\mu\ll\nu$, \text{The chi-squared divergence\ }\chi^{2}(\mu\mathbin{\|}\nu)\stackrel{\mathrm{{ def}}}{=}\int{\bigl(\frac{\mathrm{d}\mu}{\mathrm{d}\nu}-1\bigr)}\,\mathrm{d}\nu\,,L^{2}\text{-distance\ }{\|\mu/\nu\|}\stackrel{\mathrm{{ def}}}{=}\int\frac{\mathrm{d}\mu}{\mathrm{d}\nu}\,\mathrm{d}\mu=\chi^{2}(\mu\mathbin{\|}\nu)+1\,. Moreover, the rate of convergence can be quantified by the mixing time: for an error parameter $\varepsilon\in(0,1)$ and an initial distribution $\pi_{0}$, the mixing time is defined as the smallest $n\in\mathbb{N}$ such that $d_{\textrm{TV}}(\pi_{0}P^{(n)},\pi)\leq\varepsilon$. In this paper, we consider a lazy Markov chain, which does not move with probability ${ \frac{1}{2}}$ at each step, in order to avoid a uniqueness issue of a stationary distribution. Note that this change worsens the mixing time by at most a factor of $2$. One of the standard tools to control progress made by each iterate is the conductance $\Phi$ of the Markov chain with its stationary distribution $\pi$, defined by $\Phi\stackrel{\mathrm{{ def}}}{=}\inf_{\text{measurable }S}\frac{\int_{S}P(x,S^{c})\,\pi(\mathrm{d} x)}{\pi(S)\wedge\pi(S^{c})}\,.$ Another crucial factor affecting the convergence rate is geometry of the stationary distribution $\pi$, as measured by Cheeger isoperimetry $\psi_{\pi}\stackrel{\mathrm{{ def}}}{=}\inf_{\text{measurable }S}\frac{\lim_{\delta\to0^{+}}\frac{1}{\delta}\pi{\bigl(\{x:\,0<d(S,x)\leq\delta\}\bigr)}}{\pi(S)\wedge\pi(S^{c})}\,,$ where $d(S,x)$ is some distance between $x$ and the set $S$. For convex $K\subset\mathbb{R}^{d}$, let $\phi:\textup{int}(K)\to\mathbb{R}$ be a convex function, $g:\textup{int}(K)\to\mathbb{S}_{+}^{d}$ a PSD matrix function, and $\mathcal{N}_{g}^{r}(x):=\mathcal{N}{\bigl(x,\frac{r^{2}}{d}g(x)^{-1}\bigr)}$. Self-concordance (SC): A $C^{3}$-function $\phi$ is called a self-concordant barrier if $|\mathrm{D}^{3}\phi(x)[h,h,h]|\leq2{\|h\|}_{\nabla^{2}\phi(x)}^{3}$ for any $x\in\textup{int}(K)$ and $h\in\mathbb{R}^{d}$, and $\lim_{x\to\partial K}\phi(x)=\infty$. The first condition is equivalent to $-2{\|h\|}_{\nabla^{2}\phi(x)}\nabla^{2}\phi(x)\preceq\mathrm{D}^{3}\phi(x)[h]\preceq2{\|h\|}_{\nabla^{2}\phi(x)}\nabla^{2}\phi(x)$. We call it a $\nu$-self-concordant barrier for $K$ if $\sup_{h\in\mathbb{R}^{d}}(2\langle\nabla\phi(x),h\rangle-{\|h\|}_{\nabla^{2}\phi(x)}^{2})\leq\nu$ for any $x\in\textup{int}(K)$ in addition to self-concordance. A $C^{1}$-PSD matrix function $g:\textup{int}(K)\to\mathbb{S}_{+}^{d}$ is called self-concordant if $-2{\|h\|}_{g(x)}g\preceq\mathrm{D} g(x)[h]\preceq2{\|h\|}_{g(x)}g$ for any $x\in\textup{int}(K)$ and $h\in\mathbb{R}^{d}$, and there exists a self-concordant function $\phi:\textup{int}(K)\to\mathbb{R}$ such that $\nabla^{2}\phi\asymp g$ on $\textup{int}(K)$. We call it a $\nu$-self-concordant barrier for $K$ if its counterpart $\phi$ is $\nu$-self-concordant.Highly self-concordant function (HSC): A $C^{4}$-function $\phi$ is called highly self-concordant if $|\mathrm{D}^{4}\phi(x)[h,h,h,h]|\leq6{\|h\|}_{\nabla^{2}\phi(x)}^{4}$ for any $x\in\textup{int}(K)$ and $h\in\mathbb{R}^{d}$, and $\lim_{x\to\partial K}\phi(x)=\infty$.Strong self-concordance (SSC): A SC matrix function $g$ is called strongly self-concordant if $g$ is PD on $\textup{int}(K)$ and ${\|g(x)^{-1/2}\mathrm{D} g(x)[h]\,g(x)^{-1/2}\|}_{F}\leq2{\|h\|}_{g(x)}$ for any $x\in\textup{int}(K)$ and $h\in\mathbb{R}^{d}$. We call a SC function $\phi$ strongly self-concordant if $\nabla^{2}\phi(x)$ is strongly self-concordant.Lower trace self-concordant matrix (LTSC): A SC matrix function $g$ is called lower trace self-concordant if $g$ is PD on $\textup{int}(K)$ and $\textup{Tr}{\bigl(g(x)^{-1}\mathrm{D}^{2}g(x)[h,h]\bigr)}\geq-{\|h\|}_{g(x)}^{2}$ for any $x\in\textup{int}(K)$ and $h\in\mathbb{R}^{d}$. We call it strongly lower trace self-concordant (SLTSC) if for any PSD matrix function $\bar{g}$ on $\textup{int}(K)$ it holds that $\textup{Tr}{\bigl({\bigl(\bar{g}(x)+g(x)\bigr)}^{-1}\mathrm{D}^{2}g(x)[h,h]\bigr)}\geq-{\|h\|}_{g(x)}^{2}$ for any $x\in\textup{int}(K)$ and $h\in\mathbb{R}^{d}$.Average self-concordance (ASC): A matrix function $g$ is called average self-concordant if for any $\varepsilon>0$ there exists $r_{\varepsilon}>0$ such that $\mathbb{P}_{z\sim\mathcal{N}_{g}^{r}(x)}{\bigl({\|z-x\|}_{g(z)}^{2}-{\|z-x\|}_{g(x)}^{2}\leq\frac{2\varepsilon r^{2}}{d}\bigr)}\geq1-\varepsilon$ for $r\leq r_{\varepsilon}$. We call it strongly average self-concordant (SASC) if for $\varepsilon>0$ and any PSD matrix function $\bar{g}$ on $\textup{int}(K)$ it holds that $\mathbb{P}_{z\sim\mathcal{N}_{g+\bar{g}}^{r}(x)}{\bigl({\|z-x\|}_{g(z)}^{2}-{\|z-x\|}_{g(x)}^{2}\leq\frac{2\varepsilon r^{2}}{d}\bigr)}\geq1-\varepsilon$ for $r\leq r_{\varepsilon}$. We follow a standard conductance based argument (see e.g., lovasz1993randomvempala2005geometric). A lower bound on the conductance of a Markov chain provides an upper bound on the mixing time of the Markov chain due to the following result. Let $\pi_{T}$ be the distribution obtained after $T$ steps of a lazy reversible Markov chain of conductance at least $\Phi$ with stationary distribution $\pi$ and initial distribution $\pi_{0}$. For ${\|\pi_{0}/\pi\|}=\mathbb{E}_{\pi_{0}}{\bigl[\frac{\mathrm{d}\pi_{0}}{\mathrm{d}\pi}\bigr]}$ and any $\varepsilon>0$, we have $d_{\textrm{TV}}(\pi_{T},\pi)\leq\varepsilon+\sqrt{\frac{{\|\pi_{0}/\pi\|}}{\varepsilon}}{\bigl(1-\frac{\Phi^{2}}{2}\bigr)}^{T}$. A lower bound on the conductance follows from two ingredients: (i) one-step coupling and (ii) isoperimetry. The first refers to showing that the one-step distributions of the $\mathsf{Dikin\ walk}$ from two nearby points have TV-distance bounded away from one. The second is a purely geometry property about the expansion of the target distribution. Combining these two leads to a lower bound on the conductance: Let $\pi$ be the stationary distribution of a lazy reversible Markov chain on $\mathcal{M}$ with a transition kernel $P_{x}$. Assume the isoperimetry $\psi_{\mathcal{M}}$ under a Riemannian distance $d_{g}$ and the following one-step coupling: if ${\|x-y\|}_{g(x)}\leq\Delta<1$ for $x,y\in\mathcal{M}$, then $d_{\textrm{TV}}(P_{x},P_{y})\leq0.9$. Then the conductance $\Phi$ of the Markov chain is bounded lower by $\Omega(\psi_{\mathcal{M}}\Delta)$. Recall that a $\bar{\nu}$-Dikin-amenable metric is $\bar{\nu}$-symmetric, SSC, LTSC, and ASC. laddha2020strong was the first to attempt characterizing essential properties of $g$ (or $\phi$) that determine mixing times of $\mathsf{Dikin\ walks}$ for uniform sampling. Their framework necessitates that $g$ satisfies $\bar{\nu}$-symmetric, SSC, convexity of $\log\det g(x)$, and $x\in\mathcal{D}_{g}^{r}(z)$ w.h.p. (where $z\sim\text{Unif}{\bigl(\mathcal{D}_{g}^{r}(x)\bigr)}$). However, their framework encounters a challenge when further incorporating the work of narayanan2016randomized, which analyzes the $\mathsf{Dikin\ walk}$ for uniform sampling over a convex region given as the intersection of various convex sets. The challenge arises from the difficulty of verifying the convexity of $\log\det(g_{1}+g_{2})$ when $\log\det g_{i}$ is convex for each $i=1,2$. To address this challenge and succinctly characterize essential characteristics of a metric for one-step coupling, we relax the convexity of $\log\det$ to (S)LTSC and introduce the notion of ASC to account for the condition "$x\in\mathcal{D}_{g}^{r}(z)$ w.h.p.". We show that one-step coupling lemma below, one of main proof ingredients in obtaining a mixing-time guarantee of the $\mathsf{Dikin\ walk}$, can be established under Dikin-amenability of a metric. Our characterization of a metric for achieving one-step coupling is general and unifies previous work on $\mathsf{Dikin\ walks}$ (kannan2012randomnarayanan2016randomizedchen2018fastladdha2020strong). We now proceed to establish one-step coupling under the relative smoothness in $\phi$. For convex $K\subset\mathbb{R}^{d}$, let $g:\textup{int}(K)\to\mathbb{S}_{++}^{d}$ be SSC, ASC, LTSC, and $\phi:\textup{int}(K)\to\mathbb{R}$ be its function counterpart. Suppose that the potential $f$ of the target distribution $\pi$ is $\beta$-relatively smooth in $\phi$. Then there exist constants $s_{1},s_{2}>0$ such that if ${\|x-y\|}_{g(x)}\leq s_{1}r/\sqrt{d}$ with $r=s_{2}\,(1\wedge1/\sqrt{\beta})$ for $x,y\in\textup{int}(K)$, then $d_{\textrm{TV}}(P_{x},P_{y})\leq\frac{3}{4}+0.01$. We provide a sketch of the proof (see §\ref{['proof:onestep']} for the full proof). A key distinction when extending beyond uniform distributions lies in establishing a lower bound for the ratio $\frac{\exp(f(x))}{\exp(f(z))}$ to ensure a high acceptance probability. To tackle this issue, we use the symmetry of the proposal distribution, claiming $\exp(f(x))/\exp(f(z))\geq1$ at the expense of ${ \frac{1}{2}}$ probability. However, this ${ \frac{1}{2}}$ probability loss is incompatible with previous proof techniques based on the triangle inequality: for a transition kernel $T$ and proposal kernel $P$, the triangle inequality leads to $d_{\textrm{TV}}(T_{x},T_{y})\leq d_{\textrm{TV}}(T_{x},P_{x})+d_{\textrm{TV}}(P_{x},P_{y})+d_{\textrm{TV}}(P_{y},T_{y})\,,$ and then bound the second term in the RHS by Pinsker's inequality, making it arbitrarily small by taking $r=\mathcal{O}(1)$ small enough. However, this approach yields a bound of ${ \frac{1}{2}}+\varepsilon$ for both $d_{\textrm{TV}}(T_{x},P_{x})$ and $d_{\textrm{TV}}(T_{y},P_{y})$, making the RHS vacuous. We instead work with the exact formula for $d_{\textrm{TV}}(T_{x},T_{y})$: for the Gaussian $p_{x}=\mathcal{N}(x,\frac{r^{2}}{d}g(x)^{-1})$, $R_{x}(z)=\frac{p_{z}(x)}{p_{x}(z)}\frac{\pi(z)}{\pi(x)}=\sqrt{\frac{\det g(z)}{\det g(x)}}\,\frac{\exp(f(x))}{\exp(f(z))},\qquad A_{x}(z)=\min{\bigl(1,R_{x}(z)\,\mathbf{1}_{K}(z)\bigr)}\,,$ the transition kernel $T_{x}$ of the $\mathsf{Dikin\ walk}$ started at $x$ can be written as $T_{x}(dz)=\underbrace{{\bigl(1-\mathbb{E}_{p_{x}}[A_{x}(\cdot)]\bigr)}}_{\eqqcolon r_{x}}\,\delta_{x}(\mathrm{d} z)+A_{x}(z)\,p_{x}(\mathrm{d} z)\,.$ Then, d_{\textrm{TV}}(T_{x},T_{y})=\frac{r_{x}+r_{y}}{2}+\frac{1}{2}\int|A_{x}(z)\,p_{x}(z)-A_{y}(z)\,p_{y}(z)|\,\mathrm{d} z\,. As for $r_{x}$ and $r_{y}$, we bound below $\sqrt{\det g(z)/\det g(x)}$ by $1-\varepsilon$ at the cost of $\varepsilon$-probability through SSC, LTSC, and ASC of $g$, following laddha2020strong with convexity of $\log\det$ replaced by LTSC. As mentioned earlier, we also deduce $\exp(f(x))/\exp(f(z))\geq1$ through the symmetry of Gaussian distributions at the cost of ${ \frac{1}{2}}$ probability. Combining these results, we obtain upper bounds of ${ \frac{1}{2}}+\varepsilon$ for small $\varepsilon>0$ on $r_{x}$ and $r_{y}$. Establishing a bound of $1/4+\varepsilon$ on the second term is a more involved task. It requires the closeness of acceptance probabilities $A_{x}(z)$ and $A_{y}(z)$ as well as the probability densities $g_{x}(z)$ and $g_{y}(z)$. This closeness can only be achieved through sophisticated conditioning on high-probability events due to ASC, SSC, and symmetry of Gaussian proposals. To be precise, define good events $G_{x}=\cap_{i=0,2,3}B_{x,i}^{c}$ and $G_{y}=\cap_{i=0,2,3}B_{y,i}^{c}$ such that $\mathbb{P}_{\mathcal{N}_{g}^{r}(x)}(G_{x}^{c})\leq3\varepsilon$ and $\mathbb{P}_{\mathcal{N}_{g}^{r}(y)}(G_{y}^{c})\leq3\varepsilon$, where B_{x,0}=\{{\|z-x\|}_{x}\geq cr\}\,\ \text{with }c\geq1+\frac{2}{\sqrt{d}}\,\log\frac{1}{\varepsilon}\,,\quad\text{(Tail bound for Gaussian)}B_{x,1}=\{-\langle\nabla f(x),x-z\rangle\leq0\}\,,\quad\text{(Symmetry of Gaussian)}B_{x,2}=\{{\|z-x\|}_{z}^{2}-{\|z-x\|}_{x}^{2}>2\varepsilon\frac{r^{2}}{d}\}\,,\quad\text{(ASC of }g)B_{x,3}=\bigl\{\langle\nabla\varphi(x),z-x\rangle\leq-2\frac{r}{\sqrt{d}}\,{\|g(x)^{-1/2}\nabla\varphi(x)\|}_{2}\,\log\frac{1}{\varepsilon}\bigr\}\,.\quad\text{(SSC \& tail bound for Gaussian)} We further denote $G:=G_{x}\cup G_{y}$ and a partition of $G$ by $G_{x\backslash y}:=G_{x}\backslash G_{y},\qquad G_{x,y}:=G_{x}\cap G_{y},\qquad G_{y\backslash x}:=G_{y}\backslash G_{x}\,.$ Then, \frac{1}{2}\int\underbrace{|A(x,z)\,p_{x}(z)-A(y,z)\,p_{y}(z)|}_{\eqqcolon Q}\,\mathrm{d} z\leq3\varepsilon+\underbrace{\frac{1}{2}\int_{G_{x\backslash y}}Q\,\mathrm{d} z}_{\eqqcolon\mathcal{A}}+\underbrace{\frac{1}{2}\int_{G_{y\backslash x}}Q\,\mathrm{d} z}_{\eqqcolon\mathcal{B}}+\underbrace{\frac{1}{2}\int_{G_{x,y}}Q\,\mathrm{d} z}_{\eqqcolon\mathcal{C}}\,. We can bound $\mathcal{A}$ and $\mathcal{B}$ by $\mathcal{O}(\varepsilon)$ by Pinsker's inequality and a well-known formula for the $\mathsf{KL}$ divergence between two Gaussians. As for $\mathcal{C}$, conditioning on $B_{x,1}$ and using the triangle inequality lead to $\mathcal{C}\leq\frac{1}{4}+2\varepsilon+\frac{1}{2}\int_{G_{x}\cap G_{y}\cap B_{x,1}^{c}}|\min{\Bigl(1,\underbrace{\frac{\exp f(x)}{\exp f(z)}\,\frac{p_{z}(x)}{p_{x}(z)}}_{\eqqcolon\mathsf{U}}\Bigr)}-\min{\Bigl(\underbrace{\frac{p_{y}(z)}{p_{x}(z)}}_{\eqqcolon\mathsf{V}},\underbrace{\frac{\exp f(y)}{\exp f(z)}\,\frac{p_{z}(y)}{p_{x}(z)}}_{\eqqcolon\mathsf{W}}\Bigr)}|\,p_{x}(z)\,\mathrm{d} z\,.$ The bound of $\log\mathsf{U}\ge-4\varepsilon$ was already obtained when bounding $r_{x}$. We then show that $\lvert\log\mathsf{V}\rvert\le5\varepsilon$ and $\log\mathsf{W}\ge-7\varepsilon$ conditioned on $G_{x}\cap G_{y}\cap B_{x,1}^{c}$ via closeness of SSC (Lemma \ref{['lem:strongSC-closeness']}). Using these, $\int_{G_{x}\cap G_{y}\cap B_{x,1}^{c}}|1\wedge\mathsf{U}-\mathsf{V}\wedge\mathsf{W}|\,p_{x}(z)\,\mathrm{d} z\leq e^{5\varepsilon}-e^{4\varepsilon}\,,$ which results in $\mathcal{C}\le1/4+\mathcal{O}(\varepsilon)$. Putting the bounds on $r_{x},r_{y},\mathcal{A},\mathcal{B}$, and $\mathcal{C}$ together, we conclude that the TV-distance is bounded by $3/4+\mathcal{O}(\varepsilon)$. We further note that ${\|x-y\|}_{x}$ can be replaced by the Riemannian distance $d_{\phi}(x,y)$ with the metric defined by $\nabla^{2}\phi$, since these two distance are within a constant factor of each other: Let $\phi:\textup{int}(K)\to\mathbb{R}$ be self-concordant, and $x,y\in\textup{int}(K)$ with $\delta:={\|x-y\|}_{x}<1$. Then, $\delta-\frac{1}{2}\delta^{2}\leq d_{\phi}(x,y)\leq-\log(1-\delta)\,.$ Next, we present two isoperimetric inequalities derived from distinct sources: the first comes from the symmetry of a barrier, while the second arises from strong convexity in a local metric. The first one states that isoperimetry of log-concave distributions under distance $d_{g}(x,y)$ (or ${\|x-y\|}_{g(x)}$ due to Lemma \ref{['lem:Riemann-Dikin-close']}) is $\Omega(1/\sqrt{\bar{\nu}})$. The following lemma is an extension of laddha2020strong from uniform distributions (over a convex body) to general log-concave distributions. We defer the proof to §\ref{['proof:isoperimetry']}. Let $\phi$ be self-concordant and $d_{\phi}$ be the Riemannian distance induced by the Hessian metric $\nabla^{2}\phi$. For a log-concave distribution $\pi$, isoperimetry $\psi_{\pi}$ under distance $d_{\phi}$ is $\Omega(1/\sqrt{\bar{\nu}})$. Another kind of isoperimetry comes from relative strong-convexity of the potential of a distribution. For a scalar $\alpha>0$, isoperimetry of $e^{-\alpha\phi}$ on a Hessian manifold equipped with the metric $\nabla^{2}\phi$ is $\Omega(\sqrt{\alpha})$ if $\mathrm{D}^{4}\phi(x)\left[h^{\otimes4}\right]\geq0$ for all $x\in K$ and $h\in\mathbb{R}^{d}$ (see lee2018convergence). gopi2023algorithmic further generalizes this to show that if $\phi$ is self-concordant and the potential $f$ is $\alpha$-relatively strong convex, then its isoperimetry is $\Omega(\sqrt{\alpha})$. We can adapt this lemma by restricting this to a convex set $K$ (not necessarily bounded). See §\ref{['proof:isoperimetry']} for the proof. For a closed convex set $K\subset\mathbb{R}^{d}$, let a convex function $\phi:\textup{int}(K)\to\mathbb{R}$ be self-concordant on $K$, $f:\textup{int}(K)\to\mathbb{R}$ $\alpha$-relatively strongly convex in $\phi$, and $\pi$ a log-concave distribution with $\pi\propto\exp(-f)\cdot\mathbf{1}_{K}$. For a partition $\{S_{1},S_{2},S_{3}\}$ of $K$ and the Riemannian distance $d_{\phi}$ induced by the inner product $\langle a,b\rangle_{x}:=a^{\mathsf{T}}\nabla^{2}\phi(x)\,b$, it holds that $\pi(S_{3})\gtrsim\sqrt{\alpha}\,d_{\phi}(S_{1},S_{2})\,\pi(S_{1})\,\pi(S_{2})\,.\qedhere$ Putting all these components together, we obtain the following mixing-time bounds for the $\mathsf{Dikin\ walk}$. Let $K\subset\mathbb{R}^{d}$ be convex and $0\leq\alpha\leq\beta<\infty$. (Local metric) Assume that a $C^{1}$-matrix function $g:\textup{int}(K)\to\mathbb{S}_{++}^{d}$ is $\bar{\nu}$-Dikin-amenable.(Distribution) Let $\pi_{0}$ and $\pi\propto e^{-f}\cdot\mathbf{1}_{K}$ be an initial and target distribution respectively, where $f$ is $\alpha$-relatively strongly convex and $\beta$-smooth in $g$. Let ${\|\pi_{0}/\pi\|}=\mathbb{E}_{\pi_{0}}[\frac{\mathrm{d}\pi_{0}}{\mathrm{d}\pi}]$ and $P$ be the transition kernel of the $\mathsf{Dikin\ walk}$ (Algorithm \ref{['alg:DikinWalk']}) with the local metric $g$ and step size $r=\mathcal{O}(\min(1,\beta^{-1/2}))$. Then for any $\varepsilon>0$, it holds that $d_{\textrm{TV}}(\pi_{0}P^{(T)},\pi)\leq\varepsilon$ for $T\gtrsim d\,\max(1,\beta)\,\min(\bar{\nu},1/\alpha)\,\log\frac{{\|\pi_{0}/\pi\|}}{\varepsilon}$. Lemma \ref{['lem:conductance']} ensures that $\Phi\gtrsim\frac{r}{\sqrt{d}}\psi$ due to the one-step coupling in Lemma \ref{['lem:one-step']}. Lemma \ref{['lem:symmetry-iso']} leads to $\psi\gtrsim\frac{1}{\sqrt{\bar{\nu}}}$, while Lemma \ref{['lem:sc-iso']} implies $\psi\gtrsim\sqrt{\alpha}$ due to $\nabla^{2}\phi\asymp g$. Thus, $\Phi\gtrsim\frac{1}{\sqrt{d}}\,{\bigl(\sqrt{\alpha}\vee\frac{1}{\sqrt{\bar{\nu}}}\bigr)}{\bigl(1\vee\frac{1}{\sqrt{\beta}}\bigr)}\,,$ and using Lemma \ref{['lem:conductanceBound']}, we can enforce $d_{\textrm{TV}}(\pi_{T},\pi)\leq\varepsilon$ by solving $\sqrt{\Lambda}e^{-T\Phi^{2}/2}\leq\varepsilon$ and $\frac{\varepsilon}{2}+\sqrt{\frac{\Lambda}{\varepsilon/2}}e^{-T\Phi^{2}/2}\leq\varepsilon$ for $T$, which results in $T\gtrsim d\,(1\vee\beta)\,{\bigl(\bar{\nu}\wedge\frac{1}{\alpha}\bigr)}\log\frac{\Lambda}{\varepsilon}\,.\qedhere$ We derive a sampling analogue of the Interior-Point Method through comparison with IPM in optimization, by extending Gaussian cooling on manifolds introduced in cousins2018gaussianlee2018convergence. Combining the sampling IPM framework with the $\mathsf{Dikin\ walk}$ efficiently generates a warm start for a target distribution $\pi\propto e^{-f}\cdot\mathbf{1}_{K}$ with finite second moment. Let us recall our setup. Let $K\subset\mathbb{R}^{d}$ be a closed convex set, $g:\textup{int}(K)\to\mathbb{S}_{++}^{d}$ a $(\nu,\bar{\nu})$-SC matrix function, and $\phi:\textup{int}(K)\to\mathbb{R}$ its (strictly convex) SC counterpart. We assume $\min_{x}\phi(x)=0$ by considering $\phi-\min_{x}\phi(x)$ (here, $\arg\min\phi(x)$ can be efficiently found by the optimization IPM). We assume that $f$ is $\alpha$-relatively strongly convex and $\beta$-relatively smooth in $\phi$ for $0\leq\alpha\leq\beta<\infty$, i.e., $0\preceq\alpha\nabla^{2}\phi\preceq\nabla^{2} f\preceq\beta\nabla^{2}\phi$ on $\textup{int}(K)$. We define $\bar{f}(\cdot):=\frac{\nu}{d}\,f(\cdot)$ and $g_{\phi}(\cdot):=\nabla^{2}\phi(\cdot)$. Interior-Point Method Input: A $\nu$-self-concordant barrier $\phi$ for a constraint Output: $y_{\lambda}$ Denote $f_{\lambda}(y):=c^{\mathsf{T}}y+\frac{1}{\lambda}\,\phi(y)$. Phase 1: Starting feasible point Find $y_{0}=\arg\min\phi(y)$, set $\lambda=\frac{1}{6}\,{\|c\|}_{[\nabla^{2}\phi(y_{0})]^{-1}}^{-1}$, and $\bar{y}_{\lambda}\gets y_{0}$. Phase 2: Increasing $\lambda$ until $\lambda\leq\frac{\nu+1}{\varepsilon}$ $\lambda\leq\frac{\nu+1}{\varepsilon}$ $\bar{y}_{\lambda}\gets\bar{y}_{\lambda}-[\nabla^{2} f_{\lambda}(\bar{y}_{\lambda})]^{-1}\nabla f_{\lambda}(\bar{y}_{\lambda})$ "Opt. step" (e.g., the Newton step) $\lambda\gets(1+r)\,\lambda$ with $r=\frac{1}{9\sqrt{\nu}}$. Increase $\lambda$ A structural convex optimization problem is formulated as $\min_{x\in K}f(x)$, where $f:\mathbb{R}^{d}\to\mathbb{R}$ is a convex function, and $K\subset\mathbb{R}^{d}$ is a closed convex set. Also, both $K$ and $\{(x,t):f(x)\leq t\}$ admit efficiently computable self-concordant barriers denoted by $\phi_{1}$ and $\phi_{2}$, respectively. We can simplify the problem by equivalently solving $\min_{x\in K,\,\{(x,t):f(x)\leq t\}}t$ and in general focus on $\min_{x\in K,\,\{(x,t):f(x)\leq t\}}c^{\mathsf{T}}(x,t)$ for a constant $c\in\mathbb{R}^{d+1}$. IPM then regularizes $c^{\mathsf{T}}(x,t)$ by adding $\frac{1}{\lambda}\,\phi(x,t)=\frac{1}{\lambda}{\bigl(\phi_{1}(x)+\phi_{2}(x,t)\bigr)}$ for $\lambda>0$. This regularization removes the hard constraint of $K\cap\{f(x)\leq t\}$, and the resulting formulation becomes $\min_{y=(x,t)\in\mathbb{R}^{d+1}}f_{\lambda}(y):=c^{\mathsf{T}}y+\frac{1}{\lambda}\,\phi(y)\,,$ where $\phi(y)$ blows up as $y$ approaches the boundary of the constraint. For each fixed $\lambda>0$, there exists a minimum $y_{\lambda}$ of the convex function $f_{\lambda}(y)$. Intuitively, as $\lambda\to\infty$ the regularization term $\frac{1}{\lambda}\,\phi(y)$ vanishes, so $y_{\lambda}$ converges to $\arg\min_{y\in K\cap\{f(x)\le t\}}c^{\mathsf{T}}y$. The path followed by $\{y_{\lambda}\}_{\lambda>0}$ is called the central path, and IPM aims to approximately follow this central path as $\lambda$ increases. To be precise, suppose that for $\lambda_{1}>0$, an approximation solution $\bar{y}_{\lambda_{1}}$ maintained by IPM is close enough to $y_{\lambda_{1}}$. Then IPM takes an optimization step (e.g., a Newton step), which takes into account the local geometry induced by the Hessian of the barrier $\phi$, to find an approximate solution $\bar{y}_{\lambda_{2}}$ when $\lambda_{2}>\lambda_{1}$. As long as $\bar{y}_{\lambda_{1}}$ is sufficiently close to $y_{\lambda_{1}}$, this approximate solution $\bar{y}_{\lambda_{1}}$ serves a good starting point for the non-Euclidean optimizer, which takes $\bar{y}_{\lambda_{1}}$ to $\bar{y}_{\lambda_{2}}$. IPM alternates between increasing $\lambda$ and updating $\bar{y}_{\lambda}$, until $\lambda$ reaches $\nu/\varepsilon$. This is described formally in Algorithm \ref{['alg:IPM']}. The ideas behind IPM are justified by the following theoretical guarantee: Algorithm \ref{['alg:IPM']} returns $y$ in $\mathcal{O}{\bigl(\sqrt{\nu}\,\log{\bigl(\frac{\nu}{\varepsilon}{\|c\|}_{[\nabla^{2}\phi(y_{0})]^{-1}}\bigr)}\bigr)}$ iterations such that $c^{\mathsf{T}}y\leq c^{\mathsf{T}}y^{*}+\varepsilon$ for $y^{*}=\arg\min_{y\in K\cap\{f(x)\le t\}}c^{\mathsf{T}}y$. Now let us adapt each step of IPM into the sampling context with the conceptual analogy between convex optimization and logconcave sampling in mind: For convex $K\subset\mathbb{R}^{d}$ and convex function $f:K\to\mathbb{R}$ \min f(x)\quad\longleftrightarrow\quad\text{sample }x\sim\exp(-f)\text{s.t. }x\in K\qquad\qquad\quad\text{s.t. }x\in K\,. Similar to the optimization IPM, we first replace $f(x)$ by a new variable $t$ and add the constraint $\{f(x)\leq t\}$ (which is convex due to convexity of $f$), resulting in the following sampling problem: sample $(x,t)$ from a distribution with density proportional to $e^{-t}$ subject to $x\in K$ and $\{(x,t)\in\mathbb{R}^{d+1}:f(x)\leq t\}$. We note that this is indeed an equivalent sampling problem, since the $x$-marginal of the distribution is $\exp(-f)\cdot\mathbf{1}_{K}$: $\int_{\{(x,t)\in\mathbb{R}^{d+1}:f(x)\leq t\}}\exp(-t)\cdot\mathbf{1}_{K}(x)\,\mathrm{d} t=\int_{f(x)}^{\infty}\exp(-t)\cdot\mathbf{1}_{K}(x)\,\mathrm{d} t=\exp(-f)\cdot\mathbf{1}_{K}\,.$ Comparison between the optimization IPM and the sampling IPM. Now assume that $K\cap\{f(x)\leq t\}$ admits a barrier $\phi$. Thus, this motivates our focus on sampling from distributions of the form $\exp(-c^{\mathsf{T}}y)$ subject to a convex region $K$ with a barrier $\phi$, where $y:=(x,t)\in\mathbb{R}^{d+1}$ is a variable in the augmented space and $c\in\mathbb{R}^{d+1}$ is a vector. Regularizing the potential $c^{\mathsf{T}}y$ of the distribution by adding $\frac{1}{\sigma^{2}}\,\phi(y)$ for some $\sigma^{2}>0$, we can ignore the hard constraint $K$ and obtain the following formulation: for $f_{\sigma^{2}}:=\langle c,\cdot\rangle+\frac{1}{\sigma^{2}}\,\phi$, $\text{sample }y\sim\mu_{\sigma^{2}}\propto\exp(-f_{\sigma^{2}}(y))=\exp{\Bigl(-{\bigl(c^{\mathsf{T}}y+\frac{1}{\sigma^{2}}\,\phi(y)\bigr)}\Bigr)}\,,$ where $\phi(y)$ goes to infinity as it approaches the boundary of $K$. The regularization $\frac{1}{\sigma^{2}}\,\phi$ vanishes as $\sigma^{2}\to\infty$, so we can expect $\mu_{\sigma^{2}}\to\pi\propto\exp(-\langle c,\cdot\rangle)\cdot\mathbf{1}_{K}$. Comparing this with the optimization IPM, the path of measures $\{\mu_{\sigma^{2}}\}_{\sigma^{2}>0}$ can be viewed as the central path in the space of measures. In an ideal scenario, a sampling IPM should closely follow this central path while increasing $\sigma^{2}$ along the path. To this end, we update the current distribution $\bar{\mu}_{\sigma^{2}}$, which is already close to $\mu_{\sigma^{2}}$ on the central path. This update should leverage a sampling step that is aware of the local geometry induced by $\nabla^{2}\phi$, which may involve running a non-Euclidean sampler such as the $\mathsf{Dikin\ walk}$. This update brings $\bar{\mu}_{\sigma^{2}}$ to a new distribution $\bar{\mu}_{\sigma^{2}+\delta}$ that should be close to $\mu_{\sigma^{2}+\delta}$ for small $\delta>0$, while $\bar{\mu}_{\sigma^{2}}$ serves a good starting point for this sampling step to find $\bar{\mu}_{\sigma^{2}+\delta}$. This procedure is repeated until $\sigma^{2}$ becomes large enough. To use this sampling IPM, we further refine the framework via Gaussian cooling on manifolds. We refine the derived sampling IPM to obtain the $\mathsf{Gaussian\ cooling}$ on manifolds. The red dashed line indicates a centra path of measures. The red dots are target probability measures appearing in the sampling IPM, while blue dots are probability measures given by a non-Euclidean sampler, which are approximately close to those target measures (red dots). Closeness of two dots (bounded by the green dashed boxes) is quantified by the TV-distance. Gaussian Cooling introduced in cousins2018gaussian was extended to manifolds by lee2018convergence. It was initially proposed for volume computation but shares remarkable similarities with our sampling IPM. In fact, GCM can be identified with the sampling IPM with $c=0$ (i.e., uniform sampling) and the Riemannian Hamiltonian Monte Carlo employed for the non-Euclidean sampling step. Returning to the comparison with the optimization IPM, we note that two algorithms use different rules for updating $\sigma^{2}$. While the optimization IPM updates $\sigma^{2}\gets{\bigl(1+\frac{1}{\sqrt{\nu}}\bigr)}\sigma^{2}$, GCM utilizes two distinct annealing schemes: $\sigma^{2}\gets\sigma^{2}\,{\bigl(1+\frac{1}{\sqrt{d}}\bigr)}\text{if }\sigma^{2}\leq\frac{\nu}{d}\sigma^{2}\,{\bigl(1+\frac{\sigma}{\sqrt{\nu}}\bigr)}\text{o.w.}$ While the first type of update in the small regime of $\sigma^{2}$ relies on a property of logconcavity of regularized distributions $\mu_{\sigma^{2}}\propto\exp{\bigl(-{\bigl(s\phi(y)+c^{\mathsf{T}}y\bigr)}\bigr)}$, the second type of update in the large regime of $\sigma^{2}$ is justified by concentration of measure $e^{-s\phi}$ in a thin shell for $s>0$. We note that the second type in fact accelerates the annealing process. However, significant challenges remain for the sampling IPM. First, we need to extend this annealing scheme to exponential distributions (recall that GCM was proposed for uniform sampling). To be precise, we must account for the linear term $c^{\mathsf{T}}y$ (in addition to the $\phi$ term) when designing the annealing scheme. Unfortunately, the previous update scheme (which is applied only to $\phi$ part) with its analysis do not go through for this purpose. To address this issue, we introduce a further generalization of the GCM annealing scheme in the small regime of $\sigma^{2}$, enabling us to leverage logconcavity of $\mu_{\sigma^{2}}$. In the large regime of $\sigma^{2}$, we use the same annealing scheme but employ a different analytical approach, utilizing a functional inequality with no need to quantify the thin-shell phenomenon of $\mu_{\sigma^{2}}$. To discuss another remaining issue, we note that a non-Euclidean sampler used in the sampling step must have a provable mixing-time guarantee for $\mu_{\sigma^{2}}$. We already provided this through Theorem \ref{['thm:Dikin']} in §\ref{['sec:mixing-Dikin']} for the $\mathsf{Dikin\ walk}$, since the target potential is $s$-relatively strongly convex and $s$-relatively smooth in $\phi$! Our algorithm consists of four phases, where each phase updates a current distribution in a different way. For generality, we present this annealing process for a general potential $f$ instead of linear functions, where $\alpha\nabla^{2}\phi\preceq\nabla^{2} f\preceq\beta\nabla^{2}\phi$. Interior-Point Method for sampling Input: Target accuracy $\varepsilon$, local metric $g$, its counterpart $\phi$, non-Euclidean sampler $\textsf{NE-Sampler}(g,\varepsilon)$, target distribution $\pi\propto\exp(-f)$. Output: $x'$ Let $\bar{f}=\frac{\nu}{d}\,f$ and $\mu_{\sigma^{2}}\propto\exp(-V_{\sigma^{2}})$, where $V_{\sigma^{2}}:=\frac{\bar{f}+\phi}{\sigma^{2}}\text{if }\sigma^{2}\leq\frac{\nu}{d}\,,f+\frac{1}{\sigma^{2}}\,\phi\text{o.w}.$ Phase 1: Initial distribution Find $x^{*}=\arg\min_{x\in K}(\bar{f}+\phi)$ and let $D:=\mathcal{D}_{g}^{3\sigma_{0}\sqrt{d}}(x^{*})$ for $\sigma_{0}^{2}:=10^{-5}/d^{3}$. Draw $x_{0}\sim\textsf{NE-Sampler}{\bigl(g,\frac{\varepsilon}{\sqrt{d}}\bigr)}$ with initial dist. $\mathcal{N}{\bigl(x^{*},\frac{\sigma_{0}^{2}}{1+\nu\beta/d}\,g(x^{*})^{-1}\bigr)}\cdot\mathbf{1}_{D}$ and target $\mu_{\sigma_{0}^{2}}$. Phase 2 & 3: Annealing until $\sigma^{2}\leq\nu$ $\sigma^{2}\leq\nu$ Update $\sigma^{2}$ by $\sigma^{2}\gets\sigma^{2}\,{\bigl(1+\frac{1}{\sqrt{d}}\bigr)}\text{if }\sigma^{2}\leq\frac{\nu}{d}\text{ (Phase 2)}\sigma^{2}\,{\bigl(1+\frac{\sigma}{\sqrt{\nu}}\bigr)}\text{if }\frac{\nu}{d}\leq\sigma^{2}\leq\nu\text{ (Phase 3)},$ Draw $x_{i+1}\sim\textsf{NE-Sampler}{\bigl(g,\frac{\varepsilon}{\sqrt{d}}\bigr)}$ started at $x_{i}$ with target dist. $\mu_{\sigma^{2}}$, and increment $i$. Phase 4: Sampling from $e^{-f}$ Draw $x'\sim\textsf{NE-Sampler}{\bigl(g,\frac{\varepsilon}{\sqrt{d}}\bigr)}$ started at $x_{i}$ with target dist. $\pi$. Going forward, we use the following notation: for $\bar{f}(x):=\frac{\nu}{d}\,f(x)$, F(\sigma^{2}):=\int_{K}\exp{\bigl(-\frac{\bar{f}(x)+\phi(x)}{\sigma^{2}}\bigr)}\,\mathrm{d} x\text{if }\sigma^{2}\leq\frac{\nu}{d}\,,\int_{K}\exp{\bigl(-f(x)-\frac{\phi(x)}{\sigma^{2}}\bigr)}\,\mathrm{d} x\text{if }\frac{\nu}{d}\leq\sigma^{2}\leq\nu\,. We can show that $x^{*}=\arg\min_{K}(\bar{f}+\phi)$ exists in Line \ref{['line:min']} of Algorithm \ref{['alg:IPM-sampling']} and that all distributions involved in the algorithm are indeed integrable. We defer the proof to §\ref{['proof:IPM-welldefined']}. Each probability density involved in the algorithm is integrable. In this section, we demonstrate that within each phase a probability distribution $\mu_{\sigma_{i}^{2}}$ serves as a good warm start for sampling the subsequent distribution $\mu_{\sigma_{i+1}^{2}}$. While Algorithm \ref{['alg:IPM-sampling']} uses as an initial distribution $\bar{\mu}_{\sigma^{2}}$ that is approximately close to $\mu_{\sigma^{2}}$, we resolve this discrepancy through a coupling argument. We refer readers to Remark \ref{['rem:divine-intervention']} and to lovasz2006simulated for fuller details. For the first two phases, closeness of consecutive distributions follow purely from a property of log-concave distributions, which is independent of local metrics. For a log-concave function $g:\mathbb{R}^{d}\to\mathbb{R}$, the function $a\mapsto a^{d}\int g(x)^{a}\,\mathrm{d} x$ is log-concave in $a$. In Phase 1, we leverage another fundamental property of log-concave distributions. It allows us to establish that the Gaussian distribution truncated over a small Dikin ellipsoid in Phase 1 provides an $\mathcal{O}{\bigl({\bigl(\frac{\nu\beta+d}{\nu\alpha+d}\bigr)}^{d}\bigr)}$-warm start for $\mu_{\sigma_{0}^{2}}$. Thus, the $\mathsf{Dikin\ walk}$ which has a log-dependency on the warmness parameter introduces an additional factor of $d$. Let $X$ be a random point drawn from a log-concave distribution with a density $g:\mathbb{R}^{d}\to\mathbb{R}$. If $\gamma\geq2$, then $\mathbb{P}{\bigl(g(X)\leq e^{-\gamma(d-1)}\,\max g\bigr)}\leq(\gamma\,e^{1-\gamma})^{d-1}\,.$ If we can show that the $\mathsf{Dikin\ walk}$ has a $\log\log$-dependency through the blocking conductance or Gaussian isoperimetry, or if we utilize a non-Euclidean sampler with a double-log dependency, we can avoid the additional factor of $d$. We defer the proofs for closeness to §\ref{['proof:IPM-closeness']}. Let $x^{*}=\arg\min_{K}(\bar{f}+\phi)$. For $\sigma^{2}=10^{-5}/d^{3}$ and $g=\nabla^{2}\phi$, let $\mu$ be the Gaussian distribution $\mathcal{N}{\bigl(x^{*},\frac{\sigma^{2}}{1+\nu\beta/d}\,g(x^{*})^{-1}\bigr)}$ truncated over $\mathcal{D}_{g}^{3\sigma\sqrt{d}}(x^{*})$, and $\mu_{0}$ the initial distribution used in Phase 2 such that $\mu_{0}\propto\exp{\bigl(-\frac{\bar{f}+\phi}{\sigma^{2}}\bigr)}\cdot\mathbf{1}_{K}$. Then ${\|\mu/\mu_{0}\|}\lesssim{\bigl(\frac{\nu\beta+d}{\nu\alpha+d}\bigr)}^{d}$. In the following lemmas, we show that within each phase of our algorithm $\mu_{\sigma_{i}^{2}}$ serves as an $\mathcal{O}(1)$-warm start for the following distribution $\mu_{\sigma_{i+1}^{2}}$. In Phase 2, for $1/d^{3}\lesssim\sigma^{2}\leq\nu/d$ the multiplicative update of $(1+1/\sqrt{d})$ allows us to achieve an $\mathcal{O}(1)$-warm start. In Phase 2 (i.e., $\sigma_{i}^{2}\leq\nu/d$ with the update $\sigma_{i+1}^{2}=(1+1/\sqrt{d})\,\sigma_{i}^{2}$), a previous distribution $\mu_{i}$ serves as an $\mathcal{O}(1)$-warm start for the next distribution $\mu_{i+1}$, i.e., ${\|\mu_{i}/\mu_{i+1}\|}=\mathcal{O}(1)$. In the large regime of $\nu/d\leq\sigma^{2}\leq\nu$ during Phase 3, we leverage the Brascamp-Lieb inequality to show that the accelerated update of $(1+\sigma/\sqrt{\nu})$ ensures an $\mathcal{O}(1)$-warm start. Moreover, we employ the same technique along with a limiting argument to show that in Phase 4 the final distribution of $\mu_{\nu}$ is an $\mathcal{O}(1)$-warm start for the target distribution $\pi$. In Phase 3 (i.e., $\nu/d\leq\sigma_{i}^{2}\leq\nu$ with the update $\sigma_{i+1}^{2}=\sigma_{i}^{2}(1+\sigma_{i}/\sqrt{\nu})$, a previous distribution $\mu_{i}$ serves as an $\mathcal{O}(1)$-warm start for the next distribution $\mu_{i+1}$, i.e., ${\|\mu_{i}/\mu_{i+1}\|}=\mathcal{O}(1)$. In Phase 4, the distribution $\mu\propto\exp{\bigl(-(f+\phi/\nu)\bigr)}\cdot\mathbf{1}_{K}$ is an $\mathcal{O}(1)$-warm start for the target distribution $\pi\propto\exp(-f)\cdot\mathbf{1}_{K}$. We now prove Theorem \ref{['thm:Dikin-annealing']}, Algorithm \ref{['alg:IPM-sampling']} with the $\mathsf{Dikin\ walk}$ employed for the non-Euclidean sampler. For convex $K\subset\mathbb{R}^{d}$, suppose that $g:\textup{int}(K)\to\mathbb{S}_{++}^{d}$ is $(\nu,\bar{\nu})$-Dikin-amenable and $\phi$ is its function counterpart such that $\min_{K}\phi$ exists. $\mathsf{Gaussian\ cooling}$ with the $\mathsf{Dikin\ walk}$ (Algorithm \ref{['alg:IPM-sampling']} with the $\mathsf{Dikin\ walk}$ serving as a non-Euclidean sampler) generates a sample that is $\varepsilon$-close to $\exp(-f)\cdot\mathbf{1}_{K}$ in TV-distance using $\mathcal{O}{\bigl(d\,\max(d\frac{\nu\beta+d}{\nu\alpha+d},\nu,\bar{\nu})\log\frac{d\nu}{\varepsilon}\bigr)}$ iterations of the $\mathsf{Dikin\ walk}$ with $g$, where a $C^{2}$-function $f:\textup{int}(K)\to\mathbb{R}$ satisfies $\alpha\nabla^{2}\phi\preceq\nabla^{2} f\preceq\beta\nabla^{2}\phi$ on $K$ for $0\leq\alpha\leq\beta<\infty$. In particular, when $f(x)=\alpha^{\mathsf{T}}x$ or $c\phi(x)$ for $\alpha\in\mathbb{R}^{d}$ and $c\in\mathbb{R}_{+}$, the algorithm uses $\widetilde{\mathcal{O}}(d\,\max(d,\nu,\bar{\nu}))$ iterations of the $\mathsf{Dikin\ walk}$. By Theorem \ref{['thm:Dikin']}, if the potential $V$ of a target distribution satisfies $\alpha\nabla^{2}\phi\preceq\nabla^{2} V\preceq\beta\nabla^{2}\phi$, the mixing time of the $\mathsf{Dikin\ walk}$ is $d\,(1\vee\beta)\,(\bar{\nu}\wedge1/\alpha)\,\log\frac{\Lambda}{\varepsilon}$. Let $\bar{\kappa}=\frac{\nu\beta+d}{\nu\alpha+d}$. Phase 1: When a target distribution is $\exp{\bigl(-\frac{\bar{f}+\phi}{\sigma^{2}}\bigr)})$ with $\sigma^{2}=10^{-5}/d^{3}$, d^{2}{\bigl(1+\frac{\nu\beta d^{-1}+1}{\sigma^{2}}\bigr)}\,\min{\bigl(\bar{\nu},\frac{\sigma^{2}}{1+\nu\alpha d^{-1}}\bigr)}\,\log{\bigl(\frac{\nu\beta+d}{\nu\alpha+d}\bigr)}\leq d^{2}\bar{\kappa}\log\bar{\kappa}\,.Phase 2 ($1/d^{3}\lesssim\sigma^{2}\leq\nu/d$): Note that we need $\mathcal{O}^{*}(\sqrt{d})$-many iterations to double $\sigma^{2}$. Hence, in this phase the number of iterations of the $\mathsf{Dikin\ walk}$ with a target $\exp{\bigl(-\frac{\bar{f}+\phi}{\sigma^{2}}\bigr)}$ adds up to d\,{\bigl(1+\frac{\nu\beta d^{-1}+1}{\sigma^{2}}\bigr)}\,\min{\bigl(\bar{\nu},\frac{\sigma^{2}}{1+\nu\alpha d^{-1}}\bigr)}\cdot\sqrt{d}\leq d^{1.5}\bar{\kappa}+\sqrt{d}\nu\,.Phase 3 ($\nu/d\leq\sigma^{2}\leq\nu$): We need $\mathcal{O}^{*}{\bigl(\frac{\sqrt{\nu}}{\sigma}\bigr)}$-many iterations to double $\sigma^{2}$. Hence, in this phase the total number of iterations of the $\mathsf{Dikin\ walk}$ with a target $\exp{\bigl(-{\bigl(f+\frac{\phi}{\sigma^{2}}\bigr)}\bigr)}$ is $d\,{\bigl(1+\beta+\frac{1}{\sigma^{2}}\bigr)}\,\min{\bigl(\bar{\nu},\frac{1}{\alpha+\sigma^{-2}}\bigr)}\cdot\frac{\sqrt{\nu}}{\sigma}\leq\frac{d\sqrt{\nu}}{\sigma}{\bigl(\bar{\kappa}+\sigma^{2}\bigr)}\leq(d^{1.5}\bar{\kappa}+\sqrt{d}\nu)\vee(d\bar{\kappa}+d\nu)\,.$Phase 4: The $\mathsf{Dikin\ walk}$ takes $\mathcal{O}(d\bar{\nu})$ iterations. Adding up all iterations, we need $\widetilde{\mathcal{O}}(d\,(d\bar{\kappa}\vee\nu\vee\bar{\nu}))$ iterations of the $\mathsf{Dikin\ walk}$ in total. Theorem \ref{['thm:Dikin-annealing']} shows that $\mathsf{GCDW}$ running with a $(\nu,\bar{\nu}$)-Dikin-amenable metric for exponential distributions mixes in $\widetilde{\mathcal{O}}(d\max\left(d,\nu,\bar{\nu}\right))$ iterations. Since every log-concave sampling problem can be reduced to an exponential sampling problem (as shown in \ref{['eq:reduced-problem']}), Theorem \ref{['thm:Dikin-annealing']} ensures a poly-time mixing algorithm that utilizes local geometry if we have a $(\nu,\bar{\nu})$-Dikin-amenable metric for the reduced sampling problem. This poses a natural question of how to construct such an efficiently computable Dikin-amenable metric for structured sampling problems. Suppose that the structured sampling problems assume a Dikin-amenable metric for each constraint and epigraph of potentials. Motivated by self-concordance theory of the optimization IPM, we consider the sum of each barrier (and thus, the sum of metrics) as a candidate for the metric of the reduced sampling problem. In fact, this choice aligns seamlessly with the $\mathsf{Dikin\ walk}$. However, obtaining a provable guarantee of the sampling IPM with the $\mathsf{Dikin\ walk}$ necessitates a comprehensive understanding not only of self-concordance but also of SSC, SLTSC, SASC, and $\bar{\nu}$-symmetry under the addition of barriers (or metrics). In this section, we develop a "calculus" for combining metrics for multiple constraints and epigraphs, deriving the resulting theoretical guarantees (Theorem \ref{['thm:IPM-sampling']}). This leads to a consistent analogy with the work of nesterov1994interior for the optimization IPM. Self-concordance is a central notion in the theory of interior-point methods for optimization (we refer interested readers to nesterov1994interiornesterov2018lectures). We first recall basic properties of self-concordance and then investigate those of strong self-concordance and lower trace self-concordance, which are crucial to our analysis. Let $f_{i}$ be a $\nu_{i}$-self-concordant function on a convex set $K_{i}\subset\mathbb{R}^{d}$ for $i\in[2]$, and $\alpha>0$ be a scalar. (Theorem 4.1.1 and 4.2.2) $f_{1}+f_{2}$ is $(\nu_{1}+\nu_{2})$-self-concordant on $K_{1}\cap K_{2}$.(Corollary 4.1.2) $g=\nabla^{2}(\alpha f_{1})$ satisfies ${\|g(x)^{-1/2}\mathrm{D} g(x)[h]\,g(x)^{-1/2}\|}_{2}\leq\frac{2}{\sqrt{\alpha}}\,{\|h\|}_{g(x)}$ for $x\in\textup{int}(K_{1}\cap K_{2})$ and $h\in\mathbb{R}^{d}$.If $f_{1}$ is a $\nu$-self-concordant, then $cf_{1}$ is $(c\nu)$-self-concordant for $c>1$. We can extend this to self-concordant matrices as well. Let $g_{i}:\textup{int}(K_{i})\to\mathbb{S}_{+}^{d}$ be a PSD matrix function on a convex set $K_{i}\subset\mathbb{R}^{d}$ for $i\in[2]$, and $\alpha>0$ be a scalar. $g_{1}+g_{2}$ is $(\nu_{1}+\nu_{2})$-self-concordant on $K_{1}\cap K_{2}$.If $g_{1}$ is self-concordant, then $\alpha g_{1}$ satisfies $\mathrm{D}(\alpha g_{1})(x)[h]\preceq\frac{2}{\sqrt{\alpha}}\,{\|h\|}_{\alpha g_{1}}(\alpha g_{1})$ for $x\in\textup{int}(K_{1}\cap K_{2})$ and $h\in\mathbb{R}^{d}$.If $g_{1}$ is $\nu$-self-concordant, then $cg_{1}$ is $(c\nu)$-self-concordant for $c>1$. Let $\phi_{i}$ be a $\nu_{i}$-self-concordant function counterpart of $g_{i}$ on $K_{i}$ for $i\in[2]$. Then for $x\in\textup{int}(K_{1}\cap K_{2})$ and $h\in\mathbb{R}^{d}$ \mathrm{D}(g_{1}+g_{2})(x)[h]\preceq2\,{\bigl({\|h\|}_{g_{1}}g_{1}+{\|h\|}_{g_{2}}g_{2}\bigr)}\preceq2\,{\bigl({\|h\|}_{g_{1}+g_{2}}g_{1}+{\|h\|}_{g_{1}+g_{2}}g_{2}\bigr)}=2\,{\|h\|}_{g_{1}+g_{2}}(g_{1}+g_{2})\,. Clearly, $\phi_{1}+\phi_{2}$ is a function counterpart of $g_{1}+g_{2}$. Thus, $g_{1}+g_{2}$ is a $(\nu_{1}+\nu_{2})$-self-concordant matrix function on $K_{1}\cap K_{2}$. For $c>1$, if $g_{1}$ is self-concordant, then $\mathrm{D}(cg_{1})(x)[h]\preceq\frac{2}{\sqrt{c}}\,{\|h\|}_{cg_{1}}(cg_{1})\preceq2\,{\|h\|}_{cg_{1}}(cg_{1})$, and its function counterpart $c\phi_{1}$ is $(c\nu)$-self-concordant by Lemma \ref{['lem:sc-addition']}. Hence, $cg_{1}$ is $(c\nu)$-self-concordant. The following lemma ensures that the $\mathsf{Dikin\ walk}$ stays inside the convex body. This lemma was proven only for self-concordant function in nesterov2018lectures, but it can be straightforwardly extended to self-concordant matrices as well. $\mathcal{D}_{g}^{1}(x)\subset K$ for a convex set $K$ and self-concordant matrix function $g$ on $K$. Consider a matrix function $g_{\varepsilon}$ from $\textup{int}(K)$ to $\mathbb{S}_{++}^{d}$ defined by $g_{\varepsilon}(x):=g(x)+\varepsilon I$. It is self-concordant with a function counterpart $\phi(x)+\frac{\varepsilon}{2}\,{\|x\|}^{2}$, where $\phi:\textup{int}(K)\to\mathbb{R}$ is a function counterpart of $g$. For fixed $x\in\textup{int}(K)$ and $h\in\mathbb{R}^{d}$, let us define a function defined by $\psi(t):={\bigl(h^{\mathsf{T}}g_{\varepsilon}(x+th)\,h\bigr)}^{-1/2}$ for any feasible $t$. Then, $\psi'(t)=-\frac{\mathrm{D} g_{\varepsilon}(x+th)[h^{\otimes3}]}{2{\|h\|}_{g_{\varepsilon}(x+th)}^{3}}\,,$ and the definition of self-concordance leads to $|\psi'(t)|\leq1$. This function can be defined on the interval ${\bigl(-\psi(0),\psi(0)\bigr)}$ due to $\psi(t)\geq\psi(0)-|t|$ (see nesterov2018lectures). This implies that $K$ contains the set $\bigl\{x+th:|t|\leq\psi(0)={\|h\|}_{g_{\varepsilon}(x)}^{-1}\bigr\}=\{x+th:{\|th\|}_{g_{\varepsilon}(x)}\leq1\}\,.$ By sending $\varepsilon\to0$, the claim follows. The following lemma states that self-concordant metrics are similar for nearby points. Given any self-concordant matrix function $g$ on $K\subset\mathbb{R}^{d}$ and $x,y\in K$ with ${\|x-y\|}_{g(x)}<1$, we have $(1-{\|x-y\|}_{g(x)})^{2}g(x)\preceq g(y)\preceq(1-{\|x-y\|}_{g(x)})^{-2}g(x)\,.$ Strong self-concordance is additive up to a constant scaling. See §\ref{['proof:ssc-basic']} for the proof. If $g_{i}$ is a SSC matrix function on $K_{i}$ for $i\in[2]$, then $2\,(g_{1}+g_{2})$ is strongly self-concordant on $K_{1}\cap K_{2}$. Note that if we add $k$-many strongly self-concordant metrics, then we need the scaling of $2^{\log_{2}k}=k$. We remark that the factor of $2$ above might be redundant. Next, we recall an analogue of Lemma \ref{['lem:scCloseness']} for strong self-concordance. Given a strongly self-concordant matrix function $g$ on $K$, and any $x,y\in K$ with ${\|x-y\|}_{g(x)}<1$, ${\|g(x)^{-1/2}{\bigl(g(y)-g(x)\bigr)}\,g(x)^{-1/2}\|}_{F}\leq(1-{\|x-y\|}_{g(x)})^{-2}{\|x-y\|}_{g(x)}\,.$ Recall that $\bar{\nu}$-symmetry requires two-sided inclusion: the first part is $\mathcal{D}_{g}^{1}(x)\subset K\cap(2x-K)$, and the second part is $K\cap(2x-K)\subset\mathcal{D}_{g}^{\sqrt{\bar{\nu}}}(x)$. The first part immediately follows when a metric is induced by a self-concordant function. If $\phi$ is a self-concordant function on $K$, then $\mathcal{D}_{g}^{1}(x)\subset K\cap(2x-K)$ for $g=\nabla^{2}\phi$ and $x\in K$. Lemma \ref{['lem:dikin-in-body']} ensures that $y\in K$ whenever $y\in\mathcal{D}_{g}^{1}(x)$. Then $2x-y\in\mathcal{D}_{g}^{1}(x)$ and thus $2x-y\in K$. It implies that $y\in2x-K$. When a metric is induced by a self-concordant barrier with a barrier parameter $\nu$, it holds that $\bar{\nu}=\mathcal{O}(\nu^{2})$. For a self-concordant barrier $\phi$ with a barrier parameter $\nu$ on $K$ and $g=\nabla^{2}\phi$, it follows that $\bar{\nu}=\mathcal{O}(\nu^{2})$. By nesterov2003introductory, for any $x,y\in K$ with $\nabla\phi(x)\cdot(y-x)\geq0$ it follows that ${\|y-x\|}_{g(x)}\leq\nu+2\sqrt{\nu}$. Now, let $x\in K$ and $y\in K\cap(2x-K)$. The latter implies that $y-x=x-z$ for some $z\in K$. If $\nabla\phi(x)\cdot(y-x)\geq0$, then ${\|y-x\|}_{g(x)}\leq\nu+2\sqrt{\nu}.$ If $\nabla\phi(x)\cdot(y-x)<0$, then $\nabla\phi(x)\cdot(z-x)>0$ and thus ${\|y-x\|}_{g(x)}={\|z-x\|}_{g(x)}\leq\nu+2\sqrt{\nu}$. From these two cases, it holds in general that ${\|y-x\|}_{g(x)}\leq\nu+2\sqrt{\nu}$ and thus $K\cap(2x-K)\subset\mathcal{D}_{g}^{\nu+2\sqrt{\nu}}(x)$. By Lemma \ref{['lem:symmetricLeftpart']}, $\mathcal{D}_{g}^{1}(x)\subset K\cap(2x-K)$ and thus $\bar{\nu}=\mathcal{O}(\nu^{2})$. For affine constraints $Ax\geq b$, the first inclusion above has a useful equivalent description as follows: Let $x\in K=\{Ax>b\}$. It holds that $y\in K\cap(2x-K)$ if and only if ${\|A_{x}(y-x)\|}_{\infty}\leq1$. For $y\in K$, we have $Ay>b$ and thus $s_{x}=Ax-b>A(x-y)$ (elementwise inequality). As $s_{x}>0$, we have $A_{x}(x-y)\leq1$. When $y\in(2x-K)$, we can write $y=2x-z$ for some $z\in K$. Note that $A(x-y)=A(z-x)>b-Ax=-s_{x}\,,$ and thus $A_{x}(x-y)\geq-1$. Therefore, ${\|A_{x}(y-x)\|}_{\infty}\leq1$. For $\alpha\geq1$, if $g$ is $\bar{\nu}$-symmetric, then $\alpha g$ is $\alpha\bar{\nu}$-symmetric. Symmetry parameters and self-concordance parameters are additive. If a PSD matrix function $g_{i}$ is $\bar{\nu}_{i}$-symmetric on $K_{i}$ for $i\in[2]$, then $g_{1}+g_{2}$ is $(\bar{\nu}_{1}+\bar{\nu}_{2})$-symmetric on $K_{1}\cap K_{2}$. For $g:=g_{1}+g_{2}$, let $y\in\mathcal{D}_{g}^{1}(x)$. It implies $y\in\mathcal{D}_{g_{1}}^{1}(x)\cap\mathcal{D}_{g_{2}}^{1}(x)$ and so $y\in K_{i}\cap(2x-K_{i})$. Due to $\cap_{i}{\bigl(K_{i}\cap(2x-K_{i})\bigr)}=K\cap(2x-K)$, we have $y\in K\cap(2x-K)$ and so $\mathcal{D}_{g}^{1}(x)\subset K\cap(2x-K)$. Now let $y\in K\cap(2x-K)$. It is obvious that $y\in K_{i}\cap(2x-K_{i})$ for $i=1,2$, and thus $(y-x)^{\mathsf{T}}g_{1}(x)(y-x)\leq\nu_{1}\,,\qquad\text{and}\qquad(y-x)^{\mathsf{T}}g_{2}(x)(y-x)\leq\nu_{2}\,.$ By adding up these two, it follows that ${\|y-x\|}_{g(x)}^{2}\leq\nu_{1}+\nu_{2}$. It readily follows that (strongly) LTSC holds under scaling by a scalar greater than or equal to $1$. We provide a useful sufficient condition under which the sum of PSD matrix functions is LTSC. For a PSD matrix function $g_{i}$ on $K_{i}$, let $g:=\sum_{i}g_{i}$ be PD on $\bigcap_{i}K_{i}$. If $g_{i}$ is SLTSC on $K_{i}$, then $g$ is LTSC on $\bigcap_{i}K_{i}$. We note that $\mathrm{D}^{2}g_{i}(x)[h,h]\succeq0$ is a stronger condition than $\textup{Tr}{\bigl(g(x)^{-1}\mathrm{D}^{2}g_{i}(x)[h,h]\bigr)}\geq-{\|h\|}_{g_{i}(x)}^{2}$. Thus, a special case of the lemma is that if $\mathrm{D}^{2}g_{1}[h,h]\succeq0$ and $\mathrm{D}^{2}g_{2}[h,h]\succeq0$, then $g_{1}+g_{2}$ is LTSC. Note that this condition is additive. We also find that highly self-concordance is a handy sufficient condition by which one can establish strongly lower trace self-concordance, whose proof is deferred to §\ref{['proof:ltsc-basic']}. For $K\subset\mathbb{R}^{d}$, let $\bar{g}:\textup{int}(K)\to\mathbb{S}_{+}^{d}$ be a HSC matrix function, and define another matrix function by $g:=d\bar{g}$ on $K$. Then $g$ is SLTSC. Just as (S)LTSC, (S)ASC still holds under scaling by a scalar greater than or equal to $1$. Also, the definition of SASC immediately leads to the following additive condition: For a PSD matrix function $g_{i}$ on $K_{i}$ for $i\in[m]$, let $m=\mathcal{O}(1)$ and $g:=\sum_{i=1}^{m}g_{i}$ be PD on $\bigcap_{i}K_{i}$. If $g_{i}$ is SASC on $K_{i}$, then $g$ is ASC on $\bigcap_{i}K_{i}$. Fix $\varepsilon>0$. Each $g_{i}$ invokes $r_{i}(\varepsilon)$ such that if $r\leq r_{i}(\varepsilon/m)$, then $\mathbb{P}_{z}{\Bigl({\|z-x\|}_{g_{i}(x)}^{2}-{\|z-x\|}_{g_{i}(x)}^{2}\leq\frac{2\varepsilon}{m}\,\frac{r^{2}}{d}\Bigr)}\geq1-\frac{\varepsilon}{m}\,.$ If $r\leq\bar{r}(\varepsilon):=\min_{i}\,r_{i}(\varepsilon/m)$, then the union bound leads to ASC of $\sum g_{i}$ on $\bigcap_{i}K_{i}$. When does SASC hold? It is implied in narayanan2016randomized that HSC implies SASC. For completeness, we provide the proof in §\ref{['proof:sasc-basic']}. If $\phi:\textup{int}(K)\to\mathbb{R}$ is HSC, then $d\phi$ is SASC. SSC, (S)LTSC, (S)ASC of a local metric do not carry over into an extended space in the reduced sampling problem. For instance, SSC assumes the invertibility of the local metric, which may become singular in the extended space. To address this challenge, we introduce the notions of collapse and embedding, based on which we can pass those properties from the original sampling problem to the reduced problem. Let $K$ and $K'$ be convex sets in $\mathbb{R}^{d}$ and in $\mathbb{R}^{m}$ with $d\leq m$, respectively. Let $g:\textup{int}(K)\to\mathbb{S}_{+}^{d}$ be a PSD matrix function. We say $g$ is collapsed onto a linear subspace $W\subset\mathbb{R}^{d}$ if $\langle u,v\rangle_{g(x)}=\langle P_{W}u,P_{W}v\rangle_{g(x)}$ for any $x\in\textup{int}(K)$ and $u,v\in\mathbb{R}^{d}$ where $P_{W}$ is the orthogonal projection onto $W$. In other words, for an orthonormal basis $\{u_{1},\dots,u_{k}\}$ of $W$ there exists the PSD matrix function $g_{W}:\textup{int}(K)\to\mathbb{S}_{+}^{k}$ such that $\langle e_{i},e_{j}\rangle_{g_{W}(x)}=\langle u_{i},u_{j}\rangle_{g(x)}$ for $i,j\in[k]$ (i.e., $g_{W}(x)=U^{\mathsf{T}}g(x)U$ where the columns of $U\in\mathbb{R}^{d\times k}$ are $\left\{ u_{1},\dots,u_{k}\right\}$).For $g$ collapsed onto $W$, we say $g$ is PD along $W$ if $g_{W}$ is PD. In other words, ${\|h\|}_{g(x)}=0$ implies $h\perp W$.$g$ is SSC along $W$ if $g$ is a self-concordant matrix function and $g_{W}\succ0$ satisfies ${\|g_{W}(x)^{-1/2}\mathrm{D} g_{W}(x)[h]\,g_{W}(x)^{-1/2}\|}_{F}\leq2{\|h\|}_{g}\quad\text{for any }x\in\textup{int}(K)\ \text{and}\ h\in\mathbb{R}^{d}\,.$Embedding $\bar{g}$ of $g$ into $K'$ Let $P:\mathbb{R}^{m}\to\mathbb{R}^{d}$ be the projection onto the set of coordinates appearing in the variable $x$ of $g$. The embedding of $g$ onto $K'$ is a PSD matrix function $\bar{g}(y):\textup{int}(K')\to\mathbb{S}_{+}^{m}$ such that $\langle u,v\rangle_{\bar{g}(y)}=\langle Pu,Pv\rangle_{g(P(y))}$. We note that these notions are well-defined independently of the choice of an orthonormal basis of $W$. The proof can be found in §\ref{['proof:collapse-embedding-welldefined']}. Let $K\subset\mathbb{R}^{d}$ be convex and $g:\textup{int}(K)\to\mathbb{S}_{+}^{d}$ a PSD matrix function collapsed onto a subspace $W\subset\mathbb{R}^{d}$. Then PD and SSC along $W$ are well-defined (i.e., the condition for each property holds for any orthonormal basis of $W$). Using these notions, we can make it precise that an inverse mapping of affine transformations preserves SSC. We begin with a barrier version and subsequently extend it to a matrix-function version. The detailed proofs are deferred to §\ref{['proof:collap-affine']}. Let $T:\mathbb{R}^{d}\to\mathbb{R}^{m}$ be a linear operator defined by $T(x)=Ax+b$ for $A\in\mathbb{R}^{m\times d}$ and $b\in\mathbb{R}^{m}$. Let $\phi(y):\textup{int}(K)\subset\mathbb{R}^{m}\to\mathbb{R}$ be a self-concordant barrier for $K$ and define $\psi(x):=\phi(T(x))=\phi(y)$ on $\bar{K}:=T^{-1}K\subset\mathbb{R}^{d}$. If $\phi$ is a $(\nu,\bar{\nu})$-self-concordant barrier for $K$, so is $\psi$ for $\bar{K}$.If $\mathrm{D}^{4}\phi(y)[v,v]\succeq0$ for $y\in\textup{int}(K)$ and $v\in\mathbb{R}^{m}$, then $\mathrm{D}^{4}\psi(x)[u,u]\succeq0$ for $x\in\textup{int}(\bar{K})$ and $u\in\mathbb{R}^{d}$.If $\phi$ is HSC, so is $\psi$. Let $g:\textup{int}(K)\subset\mathbb{R}^{m}\to\mathbb{S}_{+}^{m}$ be a self-concordant matrix function and $T(x)=Ax+b$ with $A\in\mathbb{R}^{m\times d}$ and $b\in\mathbb{R}^{m}$ be a linear operator. Let $\bar{g}(x):=A^{\mathsf{T}}g(Tx)A$ be a PSD matrix function from $\bar{K}:=T^{-1}K\subset\mathbb{R}^{d}$ to $\mathbb{S}_{+}^{d}$. If $g$ is $(\nu,\bar{\nu})$-self-concordant barrier, so is $\bar{g}$ for $\bar{K}$.If $g$ is SSC, then $\bar{g}$ is SSC along $W=\textup{row}(A)$.If $\mathrm{D}^{2}g(y)[h,h]\succeq0$ for $y\in\textup{int}(K)$ and $h\in\mathbb{R}^{m}$, then $\mathrm{D}^{2}\bar{g}(x)[\bar{h},\bar{h}]\succeq0$ for $x\in\textup{int}(\bar{K})$ and $\bar{h}\in\mathbb{R}^{d}$.If $A$ is invertible and $g$ is SLTSC, then $\bar{g}$ is SLTSC.If $A$ is invertible and $g$ is SASC, then $\bar{g}$ is SASC. Intuitively, embedding should not affect self-concordance and symmetry parameter, which is indeed the case. Assume $K\subset\mathbb{R}^{d}$ is embeddable into $K'\subset\mathbb{R}^{m}$. If $g:\textup{int}(K)\to\mathbb{S}_{+}^{d}$ is a $(\nu,\bar{\nu})$-self-concordant matrix function, then its embedding $\bar{g}:\textup{int}(K')\to\mathbb{S}_{+}^{m}$ is a $(\nu,\bar{\nu})$-self-concordant matrix function. Since $K$ can be embedded into $K'$, there exists a projection matrix $P\in\{0,1\}^{d\times m}$ such that $\bar{g}(y)=P^{\mathsf{T}}g(Py)P$ with $x=Py\in\textup{int}(K)$ and $y\in\textup{int}(K')$. As we can view $\bar{g}$ as a matrix function induced by the inverse of the linear map $x=Py$, Lemma \ref{['lem:linear-trans-matrix']} shows that $\bar{g}$ is a $(\nu,\bar{\nu})$-self-concordant matrix function for $K'=P^{-1}K$. In reduction to the exponential sampling problem, passing essential properties (e.g., SSC, SLTSC, and SASC) of metrics from the original space to the extended space poses technical issues. We address these issues in the following two lemmas, whose proofs are deferred to §\ref{['proof:lifting-ssc']}. As mentioned earlier, SSC in the original space does not automatically imply SSC for its embedding $\bar{g}$, as SSC assumes invertibility. However, there is a useful method for extending SSC from the original space to the extended space. For convex $K\subset\mathbb{R}^{d}$, let $g:\textup{int}(K)\to\mathbb{S}_{+}^{d}$ be SSC along a subspace $W\subset\mathbb{R}^{d}$, and assume $K$ is embeddable into convex $K'\subset\mathbb{R}^{m}$ with $m\geq d$. For the embedding $\bar{g}:\textup{int}(K')\to\mathbb{S}_{+}^{m}$ of $g$ into $K'$, it holds that $\bar{g}+\varepsilon I_{m}$ is SSC on $K'$ for any $\varepsilon>0$. When extending SLTSC and SASC to the embedding space, we encounter a different subtlety. The conditions in SLTSC and SASC of $\bar{g}$ consider every PSD matrix functions $g'$ such that $\bar{g}+g'$ is invertible in the extended space $\bar{K}$. However, the embedding $\bar{g}$ of $g$ is collapsed onto the subspace corresponding to the original space $K$. As SLTSC and SASC convolve $\bar{g}$ and $g'$ by considering $(\bar{g}+g')^{-1}$ in their formulations, it is not evident whether SLTSC and SASC can be transferred to the extended space $\bar{K}$ from the original space $K$. However, by employing with Schur complements we can show that these properties can indeed carry over into the extended space. For convex $K\subset\mathbb{R}^{d}$, let $g:\textup{int}(K)\to\mathbb{S}_{+}^{d}$ is SLTSC, and assume $K$ is embeddable into convex $K'\subset\mathbb{R}^{m}$ with $m\geq d$. Then its embedding $\bar{g}:\textup{int}(K')\to\mathbb{S}_{+}^{m}$ is also SLTSC. The same is true for SASC. With our understanding of how to combine properties of barriers for constraints and epigraphs, we are prepared to prove Theorem \ref{['thm:IPM-sampling']}. Let us revisit the reduced sampling problem in \ref{['eq:reduced-problem']}: \text{sample }y\sim\tilde{\pi}\propto\exp{\Bigl(-\langle(\underbrace{0,\dots,0}_{d\text{ times}},\underbrace{1,\dots,1}_{I\text{ times}}),\cdot\rangle\Bigr)}\text{s.t. }y\in\bigcap_{i=1}^{I}E_{i}\cap\underbrace{\bigcap_{j=1}^{J}K_{j}}_{\eqqcolon:K}\eqqcolon K'\,, where $E_{i}:=\bigl\{y=(x,t_{1},\dots,t_{I})\in\mathbb{R}^{d+I}:f_{i}(x)\leq y_{d+i}\bigr\}$ for a proper closed convex function $f_{i}$ and $i\in[I]$, and $K_{j}:=\bigl\{y=(x,t_{1},\dots,t_{I})\in\mathbb{R}^{d+I}:h_{j}(x)\leq0\bigr\}$ for a closed convex function $h_{j}$ and $j\in[J]$, and $K$ has non-empty interior. We begin with a useful geometric property of $K'$. If the original sampling problem \ref{['eq:problem']} is well-defined, then the extended convex region $K'$ in the reduced sampling problem \ref{['eq:reduced-problem']} has non-empty interior and no straight line. Since $f_{i}$ and $h_{j}$ are closed and convex, $K'$ is convex and closed. Since $f_{i}$ is continuous on $\textup{int}(K)$ due to convexity (see rockafellar1997convex), its epigraph has non-empty interior. Thus, $K'$ has non-empty interior. Since $K'$ is closed and convex, it can be written as $K'=\bigcap_{i}H_{i}$ where $H_{i}=\{x:a_{i}^{\mathsf{T}}x\geq b_{i}\}$ is any halfspace containing $K'$. Suppose $K'$ contains a straight line $\ell:=\{p+th:t\in\mathbb{R}\}$ for some $p,h\in\mathbb{R}^{d}$. Then $\ell\subset H_{i}$ for any $i$, and thus $\ell$ must be parallel to any halfspace $H_{i}$ (i.e., $h\perp a_{i}$). Fix $y\in\textup{int}(K')$. The translated line $\ell_{y}$ of $\ell$ containing $y$ is still included in $H_{i}$ for all $i$. As $y\in\textup{int}(K')$, the distance from $y$ to $\partial H_{i}$ is bounded lower by $\delta>0$ for all $i$. Hence, $\ell_{y}+B_{\delta}$ is fully contained in $H_{i}$ and thus in $K'$. Clearly, integration of the exponential distribution along the fiber $\ell_{y}$ is infinite. Since $K'$ contains the cylinder $\ell_{y}+B_{\delta}$, integration of the exponential distribution over $K'$ must be infinite, leading to contradiction. The following is the extension of nesterov2018lectures to self-concordant matrix functions, which implies invertibility of Dikin-amenable metrics in the reduced problem. For convex $K\subset\mathbb{R}^{d}$ containing no straight line, a self-concordant matrix function $g:\textup{int}(K)\to\mathbb{S}_{+}^{d}$ is non-degenerate on $K$. Suppose ${\|h\|}_{g(x)}=0$ for some $0\neq h\in\mathbb{R}^{d}$ and $x\in\textup{int}(K)$. Clearly, the line $x+th$ for $t\in\mathbb{R}$ is contained in $\mathcal{D}_{g}^{1}(x)$. As $\mathcal{D}_{g}^{1}(x)\subset K$ due to Lemma \ref{['lem:dikin-in-body']}, it implies that $K$ contains a straight line $x+th$, which leads to contradiction. In the reduced problem of \ref{['eq:reduced-problem']}, let us assume the following: For $i\in[I]$, the epigraph $E_{i}$ admits a PSD matrix function $g_{i}^{e}(x,t_{i})$ (or $g_{i}^{e}(x,t_{i,1},\dots,t_{i,d})$) that is a $(\nu_{i},\bar{\nu}_{i})$-SC barrier, SSC along some subspace, SLTSC, and SASC.For $j\in[J]$, the constraint $K_{j}$ admits a PSD matrix function $g_{j}^{c}(x)$ that is a $(\eta_{j},\bar{\eta}_{j})$-SC barrier, SSC along some subspace, SLTSC, and SASC. For appropriate projections $\pi_{i}^{e}$ and $\pi^{c}$, a matrix function $g$ on $y\in\textup{int}(K')$ defined by $\langle u,v\rangle_{g(y)}:=(I+J)\,{\Bigl(\sum_{i=1}^{I}\langle\pi_{i}^{e}u,\pi_{i}^{e}v\rangle_{g_{i}^{e}(\pi_{i}^{e}(y))}+\sum_{j=1}^{J}\langle\pi^{c}u,\pi^{c}v\rangle_{g_{j}^{c}(\pi^{c}(y))}\Bigr)}\quad\text{for }u,v\in\mathbb{R}^{d}$ is ${\bigl((I+J)(\sum_{i=1}^{I}\nu_{i}+\sum_{j=1}^{J}\eta_{j}),\,(I+J)(\sum_{i=1}^{I}\bar{\nu}_{i}+\sum_{j=1}^{J}\bar{\eta}_{j})\bigr)}$-Dikin-amenable on $K'$. First of all, $\bar{g}_{i}^{e}$ is $(\nu_{i},\bar{\nu}_{i})$-self-concordant (Corollary \ref{['cor:embedding-scness']}), and SLTSC and SASC on $K'$ (Lemma \ref{['lem:embedding-sltsc']}). For fixed $\varepsilon>0$, $\bar{g}_{i}^{e}+\varepsilon I$ is SSC by Lemma \ref{['lem:embedding-ssc']}. We can make similar arguments for $\bar{g}_{j}^{c}$ regarding self-concordance, symmetry, SLTSC, SASC, and SSC. Hence, $g+(I+J)\varepsilon I$ is SSC by Lemma \ref{['lem:ssc-sum']}. Since $g$ is self-concordant on $K'$ by Lemma \ref{['lem:sc-addition']} and $K'$ contains no straight line, $g$ is PD by Lemma \ref{['lem:nondegenerate-no-straightline']}. Sending $\varepsilon$ to $0$, we can obtain SSC of $g$. LTSC and ASC of $g$ follows from Lemma \ref{['lem:sltsc-additive']} and \ref{['lem:sasc-additive']}. The symmetry parameter of $g$ follows from Lemma \ref{['lem:symmetry-addition']}. For $i\in[m]$ and domain $E_{i}\subset\mathbb{R}^{d_{i}}$, let $g_{i}(x_{i}):\textup{int}(E_{i})\to\mathbb{S}_{++}^{d_{i}}$ be a self-concordant matrix. For $l:=\sum_{i}d_{i}$ and $E:=\prod_{i}E_{i}$, we define a self-concordant matrix $g$ on $E\subset\mathbb{R}^{l}$ with block diagonals being $g_{i}$. To be precise, we can write g(x)=g(x_{1},\dots,x_{m}):=\sum_{i}\bar{g}_{i}(x)\,, where $\bar{g}_{i}:\mathbb{R}^{l}\to\mathbb{S}_{+}^{l}$ is a matrix function whose entry is all zero but the $i$-th block diagonal being $g_{i}$. When handling the direct product of domains, it is common for each domain to have an $\mathcal{O}(1)$-dimension. In such cases, scaling the barriers by dimension worsens mixing time at most constant factors while making the barriers SSC and SLTSC. We defer the proofs to §\ref{['proof:direct-ssc-sltsc']}. For open $E_{i}\subset\mathbb{R}^{d_{i}}$, let $g_{i}:E_{i}\to\mathbb{S}_{++}^{d_{i}}$ be SC. Then $g:=\sum d_{i}\bar{g}_{i}$ defined on $\prod E_{i}$ is SSC. For open $E_{i}\subset\mathbb{R}^{d_{i}}$, let $g_{i}:E_{i}\to\mathbb{S}_{++}^{d_{i}}$ be HSC. Then $g:=\sum d_{i}\bar{g_{i}}$ defined on $\prod E_{i}$ is SLTSC. nesterov1994interior introduced the notion of compatibility with a convex domain while constructing a self-concordant barrier for a wider class of structured constraints. We generalize this notion to the fourth order, by which we can easily construct a SSC, SLTSC, and SASC barrier. For a convex cone $K$, we use $a\leq_{K}b$ to denote $b-a\in K$. Let $\beta,\gamma\geq0$. Let $K$ be a convex cone in $\mathbb{R}^{m}$ and $\Gamma$ be a closed convex domain in $\mathbb{R}^{d}$. A mapping $\mathcal{A}:\textup{int}(\Gamma)\to\mathbb{R}^{m}$ of class $C^{4}$ is called $(K,\beta,\gamma)$-compatible with the domain $\Gamma$ if $\mathcal{A}$ is concave with respect to $K$. That is, $t\mathcal{A}(x)+(1-t)\,\mathcal{A}(y)\leq_{K}\mathcal{A}(tx+(1-t)\,y)$ for all $t\in[0,1]$ and $x,y\in\textup{int}(\Gamma)$. Equivalently, $-\mathrm{D}^{2}\mathcal{A}(x)[h,h]\in K$ for any $x\in\textup{int}(\Gamma)$ and $h\in\mathbb{R}^{m}$.For any $x\in\textup{int}(\Gamma)$, $y\in\Gamma\cap(2x-\Gamma)$, and $h=y-x$, it holds that \beta\mathrm{D}^{2}\mathcal{A}(x)[h,h]\leq_{K}\mathrm{D}^{3}\mathcal{A}(x)[h,h,h]\leq_{K}-\beta\mathrm{D}^{2}\mathcal{A}(x)[h,h]\,,\gamma\mathrm{D}^{2}\mathcal{A}(x)[h,h]\leq_{K}\mathrm{D}^{4}\mathcal{A}(x)[h,h,h,h]\leq_{K}-\gamma\mathrm{D}^{2}\mathcal{A}(x)[h,h]\,. An affine mapping is $(\{0\},0,0)$-compatible with any closed convex domain. We note that a function that is $(\mathbb{R}_{+},\beta,\gamma)$-compatible with $\mathbb{R}_{+}$ is a $C^{4}$-smooth concave real-valued function $f:(0,\infty)\to\mathbb{R}$ such that for any $t>0$, |f"'(t)|\leq-\frac{\beta}{t}\,f"(t)\quad\text{and}\quad|f^{(4)}(t)|\leq-\frac{\gamma}{t^{2}}\,f"(t)\,. Let $0<p\leq1$. Then the function of $f(t)=t^{p}$ is $(\mathbb{R}_{+},2-p,(2-p)\,(3-p))$-compatible with $\mathbb{R}_{+}$.$f(t)=\log t$ is $(\mathbb{R}_{+},2,6)$-compatible with $\mathbb{R}_{+}$. The following lemma is an extension of nesterov1994interior to our fourth-order compatibility. Let $K,K_{1},K_{2}$ be convex cones in $\mathbb{R}^{m},\mathbb{R}^{m_{1}},\mathbb{R}^{m_{2}}$ respectively. If $\mathcal{A}:\textup{int}(\Gamma)\to\mathbb{R}^{m}$ is $(K,\beta,\gamma)$-compatible with $\Gamma$ and $K\subset K'$ is a closed convex cone in $\mathbb{R}^{m}$, then $\mathcal{A}$ is $(K',\beta,\gamma)$-compatible with $\Gamma$.If $\mathcal{A}_{i}:\textup{int}(\Gamma_{i})\to\mathbb{R}^{m_{i}}$ is $(K_{i},\beta_{i},\gamma_{i})$-compatible with $\Gamma_{i}$ for $i=1,2$, then $\mathcal{A}:\textup{int}(\Gamma_{1}\times\Gamma_{2})\to\mathbb{R}^{m_{1}}\times\mathbb{R}^{m_{2}}$ mapping $(x,y)\to(\mathcal{A}_{1}(x),\mathcal{A}_{2}(y))$ is $(K_{1}\times K_{2},\max(\beta_{1},\beta_{2}),\max(\gamma_{1},\gamma_{2}))$-compatible with $\Gamma_{1}\times\Gamma_{2}$. We now introduce a main result in this section (see §\ref{['proof:inverse-non-linear']}). To begin with, we recall that for a closed convex domain $G\subset\mathbb{R}^{d}$ the recessive cone $R(G)$ of $G$ is $\{h\in\mathbb{R}^{d}:x+th\in G\ \text{for all }x\in G\text{ and }t>0\}$. Let $G$ be a closed convex domain in $\mathbb{R}^{m}$, $F$ be a highly $\theta$-self-concordant barrier for $G$, $\Gamma$ be a closed convex domain in $\mathbb{R}^{d}$, and $\Pi$ be a highly $\nu$-self-concordant barrier for $\Gamma$. Let $\mathcal{A}$ be a $(K,\beta,\gamma)$-compatible with $\Gamma$, where $K$ is a ray contained in the recessive cone $R(G)$. Assume that $\mathcal{A}(\textup{int}(\Gamma))\cap G\neq\emptyset$. The set $G^{+}=\overline{\textup{int}(\Gamma)\cap\mathcal{A}^{-1}{\bigl(\textup{int}(G)\bigr)}}$ is a closed convex domain in $\mathbb{R}^{d}$.For $\delta=\max\left(\beta,\gamma,2\right)$, the function $\Psi(x)=F(\mathcal{A}(x))+\delta^{2}\,\Pi(x)$ is a $(\theta+\delta^{2}\nu)$-self-concordant barrier for $G^{+}$.$\Psi$ is highly self-concordant. Using this result, we can obtain a useful tool in establishing lower trace self-concordance of a barrier for the direct product of structured sets. Let $f$ be a $C^{4}$ concave function on $\{t>0\}$ such that $|f"'(t)|\leq\frac{\beta}{t}\,|f"(t)|$ and $|f^{(4)}(t)|\leq\frac{\gamma}{t^{2}}\,|f"(t)|$ for $t>0$. Then the function $F(t,x)=-\log{\bigl(f(t)-x\bigr)}-\max(4,\beta^{2},\gamma^{2})\,\log t$ is a highly $(1+\max(4,\beta^{2},\gamma^{2}))$-self-concordant barrier for the two dimensional convex domain $G_{f}=\overline{\{(t,x)\in\mathbb{R}^{2}:t>0,\,x\leq f(t)\}}\,.$ From the discussion in Example \ref{['exa:useful-criteria']}, the map $f(t):(0,\infty)\to\mathbb{R}$ is $(\mathbb{R}_{+},\beta,\gamma)$-compatible with $\mathbb{R}_{+}$. Clearly, the identity map from $\mathbb{R}$ to $\mathbb{R}$ is $(\{0\},0,0)$-compatible with $\mathbb{R}$. Hence by Lemma \ref{['lem:extension-compatibility']}-(2) implies that the map $\mathcal{A}:\mathbb{R}_{+}\times\mathbb{R}\to\mathbb{R}^{2}$ defined by $\mathcal{A}(t,x)=(f(t),x)$ is $(\{0\}\times\mathbb{R}_{+},\beta,\gamma)$-compatible with $\mathbb{R}_{+}\times\mathbb{R}$. Now observe that $G_{f}$ can be written as $\mathcal{A}^{-1}{\bigl(\{(t,x):x\leq t\}\bigr)}$ and that $K=\{0\}\times\mathbb{R}_{+}$ is a ray contained in the recessive cone $R(G)$ for $G:=\{(t,x):x\leq t\}$. By applying Lemma \ref{['lem:compatible']} to the highly $1$-self-concordant barriers $F(t,x)=-\log(t-x)$ for $G$ and $\Phi(t,x)=-\log t$ for $\mathbb{R}_{+}\times\mathbb{R}$, it follows that $F$ is is a highly $(1+\max(4,\beta^{2},\gamma^{2}))$-self-concordant barrier for $G_{f}$. We can prove a similar result for a convex $f$ as follows: Let $f$ be a $C^{4}$ convex function on $\{x>0\}$ such that $|f"'(x)|\leq\frac{\beta}{x}\,f"(x)$ and $|f^{(4)}(x)|\leq\frac{\gamma}{x^{2}}\,f"(x)$ for $x>0$. Then the function $F(t,x)=-\log{\bigl(t-f(x)\bigr)}-\max(4,\beta^{2},\gamma^{2})\,\log x$ is a highly $(1+\max(4,\beta^{2},\gamma^{2}))$-self-concordant barrier for the two dimensional convex domain $G_{f}=\overline{\{(t,x)\in\mathbb{R}^{2}:x>0,\,t\geq f(x)\}}\,.$ Its proof follows from applying Lemma \ref{['lem:tool-concave']} to the image of $G_{f}$ under the map $(t,x)\to(-x,t)$. In order to obtain a mixing-time bound of the $\mathsf{Dikin\ walk}$ for the reduced problem, a concrete understanding of properties and parameters of barriers for $K_{i}$ and $K_{j}$ is essential. To this end, we revisit self-concordant barriers for structured convex constraints and level sets, examining the required scaling factors which ensure those properties. Consider a set of linear constraints: $K=\{x\in\mathbb{R}^{d}:Ax\geq b\}$ for $A\in\mathbb{R}^{m\times d}$ and $b\in\mathbb{R}^{m}$, where $A$ has no all-zero rows. We use $s_{x}:=Ax-b$ to denote the slack at $x$, and $A_{x}:=S_{x}^{-1}A$ to denote the constraints normalized by the slack, where $S_{x}:=\textup{Diag}(s_{x})$ is the diagonalization of the slack. We now introduce three barriers (and metrics) for handling the linear constraints. The logarithmic barrier $\phi_{\log}(x):=-\sum_{i=1}^{m}\log(a_{i}^{\mathsf{T}}x-b_{i})$ is the simplest self-concordant barrier for linear constraints. We refer readers to §\ref{['proof:linear-log-barrier']} for gentle introduction to the log-barriers. As seen below, we demonstrate that the metric induced by the logarithmic barrier has $\nu,\bar{\nu}=m$ and requires no scaling to achieve SSC, SLTSC, and SASC. For a closed convex $K=\{x\in\mathbb{R}^{d}:Ax\geq b\}$ with $A\in\mathbb{R}^{m\times d}$ and $b\in\mathbb{R}^{m}$, let $\phi_{\log}(x)=-\sum_{i=1}^{m}\log(a_{i}^{\mathsf{T}}x-b_{i})$ and define $g(x):=\nabla^{2}\phi_{\log}(x)=A_{x}^{\mathsf{T}}A_{x}$. $\nu=m$ (nesterov1994interior).SSC along $\textup{row}(A)$ and $\bar{\nu}=m$ (Lemma \ref{['lem:paramsBarrier']}).$\mathrm{D}^{2}g(x)[h,h]\succeq0$ for any $h\in\mathbb{R}^{d}$ (so SLTSC) (Claim \ref{['claim:diffLogBarrier']}).SASC (Lemma \ref{['lem:logBarrier-SASC']}). In sampling over a polytope $K$, the number $m$ of constraints is assumed to be greater than the ambient dimension $d$. Given that the mixing time of the $\mathsf{Dikin\ walk}$ for uniform sampling is $\widetilde{\mathcal{O}}(d\bar{\nu})=\widetilde{\mathcal{O}}(dm)$, a larger $m$ leads to a worse mixing time. Is there a self-concordant barrier that has a better dependence on $m$ for its self-concordance and symmetry parameters, without compromising SSC, SLTSC, and SASC? Let us recall the leverage score first and move onto such improved self-concordant barriers. For a full-rank matrix $A\in\mathbb{R}^{m\times d}$ with $m\geq d$, we recall that $P(A)=A(A^{\mathsf{T}}A)^{-1}A^{\mathsf{T}}$ is the orthogonal projection matrix onto the column space of $A$, and the leverage scores of $A$ is $\sigma(A)=\textsf{diag}(P(A))\in\mathbb{R}^{m}$. We let $\Sigma(A):=\textup{Diag}(\sigma(A))=\textup{Diag}(P(A))$ and $P^{(2)}(A)=P(A)\circ P(A)$, where $P(A)\circ P(A)$ is the Hadamard product of size $d\times d$ defined by $(P(A)\circ P(A))_{ij}=[P(A)]_{ij}^{2}$. vaidya1996new introduced the volumetric barrier for $K$ defined by $\phi_{\textrm{vol}}=\frac{1}{2}\,\log\det(\nabla^{2}\phi_{\log})=\frac{1}{2}\,\log\det(A_{x}^{\mathsf{T}}A_{x})\,.$ Then the Hessian of $\phi_{\textrm{vol}}$ can be written as $\nabla^{2}\phi_{\textrm{vol}}=A_{x}^{\mathsf{T}}(3\Sigma_{x}-2P_{x}^{(2)})A_{x}\,,$ where $\Sigma_{x}=\textup{Diag}(\sigma(A_{x}))$ is the diagonalized leverage scores, and this Hessian satisfies $A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}\preceq\nabla^{2}\phi_{\textrm{vol}}(x)\preceq3A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}\,.$ We refer readers to §\ref{['proof:linear-volumetric']} for details. In other words, the approximate volumetric metric $A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}$ serves as an $\mathcal{O}(1)$-approximation of the local metric $\nabla^{2}\phi_{\textrm{vol}}$ (i.e., $A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}\asymp\nabla^{2}\phi_{\textrm{vol}}(x)$). We find in Lemma \ref{['lem:paramsBarrier']} that the local metric $40\sqrt{m}A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}$ is SSC with $\nu,\,\bar{\nu}=\mathcal{O}(\sqrt{m}d)$, but in some regime of $d$ this parameter leads to worse mixing of the $\mathsf{Dikin\ walk}$. In the same paper, vaidya1996new introduced a regularized volumetric metric by adding $\mathcal{O}{\bigl(\nabla^{2}\phi_{\log}\bigr)}$, which we call the Vaidya metric: $g(x):=\sqrt{\frac{m}{d}}\,A_{x}^{\mathsf{T}}{\bigl(\Sigma_{x}+\frac{d}{m}I_{m}\bigr)}A_{x}\,.$ Note that $g(x)\asymp\nabla^{2}{\bigl(\sqrt{\frac{m}{d}}{\bigl(\phi_{\textrm{vol}}+\frac{d}{m}\text{$\phi_{\log}$}\bigr)}\bigr)}$. We show that the Vaidya metric is also SSC, SLTSC, and SASC without additional scaling, while it has a better $\nu$ and $\bar{\nu}$ than the logarithmic barrier. For a closed convex $K=\{x\in\mathbb{R}^{d}:Ax\geq b\}$ with $A\in\mathbb{R}^{m\times d}$ and $b\in\mathbb{R}^{m}$, let $g(x)=\sqrt{\frac{m}{d}}A_{x}^{\mathsf{T}}{\bigl(\Sigma_{x}+\frac{d}{m}I_{m}\bigr)}A_{x}$. $\nu=\mathcal{O}(\sqrt{md})$ anstreicher1997volumetric.SSC and $\bar{\nu}=\mathcal{O}(\sqrt{md})$ (Lemma \ref{['lem:paramsBarrier']}).SLTSC (Lemma \ref{['lem:vaidya-SLTSC']}) and SASC (Lemma \ref{['lem:vaidya-SASC']}). Self-concordance and symmetry parameters of $\mathcal{O}(\sqrt{md})$ is certainly better than $\mathcal{O}(m)$, but can we even achieve an $\mathcal{O}(d\log^{\mathcal{O}(1)}m)$ bound on those parameters? Let us recall the $\ell_{p}$-Lewis weights. The $\ell_{p}$-Lewis weight of $A$ is denoted by $w(A)$, the solution $w$ to the equation $w(A)=\textsf{diag}{\bigl(W^{1/2-1/p}A(A^{\mathsf{T}}W^{1-2/p}A)^{-1}A^{\mathsf{T}}W^{1/2-1/p}\bigr)}\in\mathbb{R}^{m}$ for $W:=\textup{Diag}(w)$. For $W_{x}=\textup{Diag}(w(A_{x}))$ and $p\geq2$, the Lewis weight barrier function is defined by $\phi_{\textup{Lw}}(x):=\log\det(A_{x}^{\mathsf{T}}W_{x}^{1-2/p}A_{x})\,.$ Note that the leverage score and volumetric barrier can be recovered as a special case of the Lewis weight and barrier by setting $p=2$. As done for the Vaidya metric, it is natural to consider the Lewis weight metric with $p=\Theta(\log^{\mathcal{O}(1)}m)$, defined as $g(x):=\mathcal{O}(\log^{\mathcal{O}(1)}m)\,A_{x}^{\mathsf{T}}W_{x}A_{x}\,.$ In fact, this metric serves as an $\mathcal{O}(\log^{\mathcal{O}(1)}m)$-approximation of $\nabla^{2}\phi_{\textup{Lw}}$, as demonstrated in the following relation proven in lee2019solving: $A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}\preceq\nabla^{2}\phi_{\textup{Lw}}\preceq(1+p)\,A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}\,.$ Ignoring the logarithmic factors we have $\nabla^{2}\phi_{\textup{Lw}}\asymp g$. Notably, the Lewis-weight metric needs an additional $\sqrt{d}$-scaling for SLTSC and SASC, unlike the logarithmic barrier and Vaidya metric. Hence, when combining this with other metrics, one should use $\sqrt{d}g$, which leads to $\nu,\,\bar{\nu}=\mathcal{O}(d^{3/2}\,\log^{\mathcal{O}(1)}m)$. For a closed convex $K=\{x\in\mathbb{R}^{d}:Ax\geq b\}$ with $A\in\mathbb{R}^{m\times d}$ and $b\in\mathbb{R}^{m}$, let $g(x)=\mathcal{O}(\log^{\mathcal{O}(1)}m)\,A_{x}^{\mathsf{T}}W_{x}A_{x}$. $\nu=\mathcal{O}(d\log^{5}m)$ lee2019solving.SSC and $\bar{\nu}=\mathcal{O}(d\log^{\mathcal{O}(1)}m)$ (Lemma \ref{['lem:paramsBarrier']}).$\sqrt{d}g$ is SLTSC (Lemma \ref{['lem:Lw-SLTSC']}) and SASC (Lemma \ref{['lem:Lw-SASC']}). We defer the proofs of two lemmas below to §\ref{['proof:linear-SSC-symm']}. We study SSC and symmetry of the metrics of the form $A_{x}^{\mathsf{T}}D_{x}A_{x}$ in Lemma \ref{['lem:helper4Diagonal']}, where $D_{x}\in\mathbb{R}^{m\times m}$ is a diagonal matrix used to address the constraints of the form $Ax\geq b$ for $A\in\mathbb{R}^{m\times d}$ and $b\in\mathbb{R}^{m}$. Specifically, we relate the notions of SSC and symmetry to well-studied terms in the field of optimization, namely $\max_{i}\,[\sigma(\sqrt{D_{x}}A_{x})]_{i}/[D_{x}]_{ii}$ and ${\|\mathrm{D} D_{x}[h]\|}_{D_{x}^{-1}}^{2}$. For $x\in\textup{int}(K)$, let $g(x)=A_{x}^{\mathsf{T}}D_{x}A_{x}\in\mathbb{R}^{d\times d}$ for a diagonal matrix $0\prec D_{x}\in\mathbb{R}^{m\times m}$. For any PSD matrix function $g'$ such that $g'+g$ is invertible on the domain, {\|(g'(x)+g(x))^{-1/2}\mathrm{D} g(x)[h]\,(g'(x)+g(x))^{-1/2}\|}_{F}^{2}\qquad\qquad\leq4\max_{i}\frac{[\sigma(\sqrt{D_{x}}A_{x})]_{i}}{[D_{x}]_{ii}}\cdot{\bigl({\|h\|}_{g(x)}^{2}+\sum_{i=1}^{m}\frac{(\mathrm{D} D_{x}[h])_{ii}^{2}}{[D_{x}]_{ii}}\bigr)}\,.$\max_{h:{\|h\|}_{g(x)}=1}{\|A_{x}h\|}_{\infty}={\bigl(\max_{i\in[m]}\frac{[\sigma(\sqrt{D_{x}}A_{x})]_{i}}{[D_{x}]_{ii}}\bigr)}^{1/2}$.$K\cap(2x-K)\subset\mathcal{D}_{g}^{\sqrt{\textup{Tr}(D_{x})}}(x)$. Then for each metric we refer to existing bounds on these terms, estimating the smallest possible scaling required for SSC and symmetry. Let $A\in\mathbb{R}^{m\times d}$, $\Sigma_{x}=\textup{Diag}(\sigma(A_{x}))\in\mathbb{R}^{m\times m}$, and $W_{x}=\textup{Diag}(w_{x})\in\mathbb{R}^{m\times m}$ for the $\ell_{p}$-Lewis weight $w_{x}$ with $p=\mathcal{O}(\log m)$. Logarithmic metric: $g(x)=A_{x}^{\mathsf{T}}A_{x}$ with $D_{x}=I_{m}$ is SSC along $\textup{row}(A)$ with $\bar{\nu}=m$.Approximate volumetric metric: $g(x)=40\sqrt{m}A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}$ with $D_{x}=40\sqrt{m}\Sigma_{x}$ is SSC with $\bar{\nu}=\mathcal{O}(\sqrt{m}d)$.Vaidya metric: $g(x)=22\sqrt{\frac{m}{d}}A_{x}^{\mathsf{T}}{\bigl(\Sigma_{x}+\frac{d}{m}I_{m}\bigr)}A_{x}$ with $D_{x}=22\sqrt{\frac{m}{d}}{\bigl(\Sigma_{x}+\frac{d}{m}I_{m}\bigr)}$ is SSC with $\bar{\nu}=\mathcal{O}(\sqrt{md})$.Lewis-weight metric: $\exists$ positive constants $c_{1}$ and $c_{2}$ such that $g(x)=c_{1}(\log m)^{c_{2}}A_{x}^{\mathsf{T}}W_{x}A_{x}$ is SSC and $\bar{\nu}$-symmetric with $\bar{\nu}=\mathcal{O}^{*}(d)$. We show SLTSC of the Vaidya and Lewis-weight metric. Let $g_{2}$ be either Vaidya or Lewis-weight metric, and $g_{1}$ be an arbitrary PSD matrix function on $K$ such that $g=g_{1}+g_{2}$ is PD on $\textup{int}(K)$. Ensuring (S)LTSC of the Vaidya or Lewis-weight metrics is challenging, as $\mathrm{D}^{2}g_{2}[h,h]\succeq0$ is difficult to verify due to complicated expressions for $\mathrm{D}^{2}\Sigma_{x}[h,h]$ and $\mathrm{D}^{2}W_{x}[h,h]$. As for the Vaidya metric, we compute higher-order derivatives of leverage scores and other pertinent matrices in Lemma \ref{['lem:calculusLeverage']}, finding succinct formulas by using algebraic properties of the Hadamard product. We then show SLTSC of $g_{2}$ using these results (see §\ref{['proof:linear-vaidya-SLTSC']} for the proof): $\textup{Tr}{\bigl(g^{-1}\mathrm{D}^{2}g_{2}(x)[h,h]\bigr)}\geq-{\|h\|}_{g_{2}(x)}^{2}/2$ for the Vaidya metric $g_{2}$. For the Lewis-weights metric, analysis is more involved due to numerous terms appearing in $\mathrm{D}^{2}W_{x}[h,h]$. In order to avoid dealing with each of the terms, we employ existing bounds on derivatives of $W_{x}$ and other relevant matrices in §\ref{['proof:linear-LW']}. This approach significantly simplifies the computation but comes at the cost of an additional scaling of $\sqrt{d}$, which as far as we can tell might be unavoidable. We refer readers to §\ref{['proof:linear-Lewis-SLTSC']} for the proof. $\textup{Tr}{\bigl(g(x)^{-1}\mathrm{D}^{2}g_{2}(x)[h,h]\bigr)}\geq-{\|h\|}_{g_{2}(x)}^{2}$, where $g_{2}(x)=cA_{x}^{\mathsf{T}}W_{x}A_{x}$ with $c=c_{1}(\log m)^{c_{2}}\sqrt{d}$ for some constants $c_{1},c_{2}>0$. Typically, (S)ASC is the most challenging property to verify, often requiring involved analysis in order to establish it without additional scalings. Since the three metrics are HSC (e.g., see Lemma \ref{['lem:Lw-hsc']} for Lewis-weight metrics), scaling by $d$ leads to SASC by Lemma \ref{['lem:hsc-to-sasc']}. However, for linear constraints one can still achieve SASC without scaling (or with a smaller scaling) through more sophisticated concentration techniques. To sketch this idea, we recall that SASC requires showing that for small enough $r$ ${\|z-x\|}_{g(z)}^{2}-{\|z-x\|}_{g(x)}^{2}\leq2\varepsilon\frac{r^{2}}{d}\,.$ Taylor's expansion of ${\|z-x\|}_{g(z)}^{2}$ at $z=x$ up to second-order necessitates bounds on $\mathrm{D} g(x)[(z-x)^{\otimes3}]=\frac{r^{3}}{d^{3/2}}\mathrm{D} g(x)[h^{\otimes3}]\qquad\text{and}\qquad\mathrm{D} g(x')[(z-x)^{\otimes4}]=\frac{r^{4}}{d^{2}}\mathrm{D}^{2}g(x')[h^{\otimes4}]\,,$ for some $x'\in[x,z]$ and $h\sim\mathcal{N}(0,I_{d})$. Observe that the first-order term $P(h):=\frac{r^{3}}{d^{3/2}}\mathrm{D} g(x)[h^{\otimes3}]$ is a Gaussian polynomial in $h$, and this is where we can invoke the following concentration phenomenon: For $d\geq1$, let $P:\mathbb{R}^{d}\to\mathbb{R}$ be a polynomial of degree $n$. For any $t\geq(2e)^{n/2}$, $\mathbb{P}_{h\sim\mathcal{N}(0,I_{d})}{\Bigl[|P(h)|\geq t\sqrt{\mathbb{E}[P(h)^{2}]}\Bigr]}\leq\exp{\bigl(-\frac{n}{2e}\,t^{2/n}\bigr)}\,.$ This concentration inequality necessitates bounding $\mathbb{E}[P(h)^{2}]$, and this is where Stein's lemma comes into play: For $h=(h_{1},\dots,h_{d})\sim\mathcal{N}(0,I_{d})$, it holds that $\mathbb{E}[h_{i}f(h)]=\mathbb{E}[\partial_{i}f(h)]$. Unlike the first-order term, the second-order term is not a Gaussian polynomial due to $x'$ depending on $z$. To address this issue, we derive an upper bound (in absolute value) of the quadratic form. Using coordinate-wise closeness of slacks, leverage scores, and Lewis weights at two nearby points, we replace every value estimated at $z$ by those at $x$, removing dependence on $z$ in the quadratic bound. The resulting quadratic bound is now a Gaussian polynomial, so we follow the same proof approach as with the first-order term. This approach was used by sachdeva2016mixing for ASC of log-barriers and by chen2018fast for that of Vaidya and Lewis-weight metrics. We further extend this approach to achieve SASC of those metrics, going beyond ASC. $g(x)=\nabla^{2}\phi_{\log}(x)=A_{x}^{\mathsf{T}}A_{x}$ is SASC. See §\ref{['proof:linear-SASC-log']} for the proof. $g(x)=\mathcal{O}{\bigl(\sqrt{\frac{m}{d}}\bigr)}\,A_{x}^{\mathsf{T}}(\Sigma_{x}+\frac{d}{m}I_{m})A_{x}$ is SASC. See §\ref{['proof:linear-SASC-vaidya']} for the proof. There exists constants $c_{1}$ and $c_{2}$ such that $g(x)=c_{1}\sqrt{d}\log^{c_{2}}m\,A_{x}^{\mathsf{T}}W_{x}A_{x}=\mathcal{O}^{*}(\sqrt{d})\,A_{x}^{\mathsf{T}}W_{x}A_{x}$ is SASC. See §\ref{['proof:linear-SASC-Lw']} for the proof. Suppose that in \ref{['eq:reduced-problem']} we have either $f_{i}(x),\,h_{j}(x)={\|x-\mu\|}_{\Sigma}^{2}$ or $\frac{1}{2} x^{\mathsf{T}}Qx+p^{\mathsf{T}}x+l$ for $\mu,p\in\mathbb{R}^{d}$, $\Sigma\in\mathbb{S}_{++}^{d}$, and $0\neq Q\in\mathbb{S}_{+}^{d}$. Consider a second-order region given by $K=\{x\in\mathbb{R}^{d}:\frac{1}{2} x^{\mathsf{T}}Qx+p^{\mathsf{T}}x+l\leq0\}$. nesterov1994interior shows that $\phi:=-\log f$ is an $1$-self-concordant barrier for $K$, when $f(x)=-\frac{1}{2}{\|x-\mu\|}_{\Sigma}^{2}$ or $-(\frac{1}{2} x^{\mathsf{T}}Qx+p^{\mathsf{T}}x+l)$. Since $\bar{\nu}=\mathcal{O}(\nu^{2})$ for a self-concordant barrier due to Lemma \ref{['lem:bound-symmetry']}, $\phi$ is $\mathcal{O}(1)$-symmetric. In case we consider ${\|x-\mu\|}_{\Sigma}^{2}$, the trivial scaling by dimension $d$ implies that $d\phi$ is SSC and $\mathcal{O}(d)$-symmetric. Moreover, $d\phi$ is SASC by Lemma \ref{['lem:hsc-to-sasc']} by HSC of $\phi$. For HSC of $\phi$, we develop a handy tool for checking HSC. See §\ref{['proof:quadratic']} for the proof. For a real-valued function $f$ on $K\subset\mathbb{R}^{d}$, let $\psi=-\log f$ be a $\nu$-self-concordant barrier for $K$. Then, $|\mathrm{D}^{4}\psi(x)[h^{\otimes4}]|\lesssim\nu^{2}{\|h\|}_{\nabla^{2}\psi(x)}^{2}+|\frac{\mathrm{D}^{4}f(x)[h^{\otimes4}]}{f(x)}|\,.$ Using this tool, we can study properties of the barrier for the quadratic constraints. We provide the proof in §\ref{['proof:quadratic']}. For a closed convex $K=\{x\in\mathbb{R}^{d}:\frac{1}{2} x^{\mathsf{T}}Qx+p^{\mathsf{T}}x+l\leq0\}$ with $p\in\mathbb{R}^{d}$ and $0\neq Q\in\mathbb{S}_{+}^{d}$, let $\phi(x)=-\log(-l-p^{\mathsf{T}}x-\frac{1}{2} x^{\mathsf{T}}Qx)$ and $g=d\,\nabla^{2}\phi$. $\nu,\,\bar{\nu}=\mathcal{O}(d)$.SSC when $Q\succ0$, and SASC.$\mathrm{D}^{2}g(x)[h,h]\succeq0$ for any $x\in\textup{int}(K)$ and $h\in\mathbb{R}^{d}$ (so SLTSC). Suppose the quadratic term $f(x)=\frac{1}{2}{\|x-\mu\|}_{\Sigma}^{2}$ appears in a potential of a target distribution. Then its epigraph is $\{(x,t)\in\mathbb{R}^{d+1}:\frac{1}{2}{\|x-\mu\|}_{\Sigma}^{2}-t\leq0\}\,,$ and clearly $q(x,t)=\frac{1}{2}{\|x-\mu\|}_{\Sigma}^{2}-t$ is a quadratic function in $(x,t)$. Hence, this level set admits an $1$-self-concordant barrier $\phi(x,t)=-\log(t-\frac{1}{2}{\|x-\mu\|}_{\Sigma}^{2})\,.$ Our earlier discussion immediately leads to the following result: Consider a closed convex $K=\{(x,t):\frac{1}{2}{\|x-\mu\|}_{\Sigma}^{2}\leq t\}$ with $\mu\in\mathbb{R}^{d}$ and $\Sigma\in\mathbb{S}_{++}^{d}$, and let $\phi(x)=-\log(t-\frac{1}{2}{\|x-\mu\|}_{\Sigma}^{2})$ and $g=d\,\nabla^{2}\phi$. $\nu_{g},\,\bar{\nu}_{g}=\mathcal{O}(d)$.SSC and SASC.$\mathrm{D}^{2}g(x,t)[h,h]\succeq0$ for any $(x,t)\in\textup{int}(K)$ and $h\in\mathbb{R}^{d+1}$. It is common that a potential includes a non-smooth term like ${\|Ax-b\|}_{2}$ in many applications, and we can handle such potentials via our framework. nesterov1994interior shows that $\phi(x,t)=-\log(t^{2}-{\|x\|}^{2})$ is a $2$-self-concordant for a level set $K=\{(x,t)\in\mathbb{R}^{d}\times\mathbb{R}:{\|x\|}_{2}\leq t\}$ (here we may assume that $\mu=0$ and $\Sigma=I$ due to Lemma \ref{['lem:linear-trans']}). This level set is called a second-order cone or Lorentz cone. Applying Lemma \ref{['lem:4th-log']} to $f(x,t)=t^{2}-{\|x\|}^{2}$ with $\nu=2$, we immediately show HSC of $\phi$. Thus, $d\phi$ satisfies SLTSC and SASC by Lemma \ref{['lem:hsc-to-sltsc']} and Lemma \ref{['lem:hsc-to-sasc']}, respectively. Consider a closed convex $K=\{(x,t):{\|x-\mu\|}_{\Sigma}\leq t\}$ with $\mu\in\mathbb{R}^{d}$ and $\Sigma\in\mathbb{S}_{++}^{d}$, and let $\phi(x,t)=-\log(t^{2}-{\|x-\mu\|}_{\Sigma}^{2})$ and $g=d\,\nabla^{2}\phi$. $\nu_{g},\,\bar{\nu}_{g}=\mathcal{O}(d)$.SSC, SASC, and SLTSC. The function $\phi(X)=-\log\det X$ serves as an $d$-self-concordant barrier for the PSD cone $\mathbb{S}_{+}^{d}$. While achieving self-concordance does not require additional scaling, it turns out that SSC requires a scaling of $\Theta(d)$. Notably, this scaling is less than the trivial dimension-based scaling of $d_{s}:=d(d+1)/2$. Also, direct computation leads to $\mathrm{D}^{4}\phi(X)[H,H]\succeq0$ (so SLTSC). As $\phi$ is HSC, scaling by $d_{s}$ ensures SASC. However, we can achieve ASC with a smaller scaling by $\mathcal{O}(d)$ via the random matrix theory. On a closed convex $K=\mathbb{S}_{+}^{d}$, let $\phi(X)=-\log\det X$ and define $g=d\,\nabla^{2}\phi$. $\nu=d^{2}$ (nesterov1994interior) and $\bar{\nu}=d^{2}$ (Lemma \ref{['lem:logdet-symm']}).SSC (Corollary \ref{['cor:logdet-ssc']}).$\mathrm{D}^{2}g(X)[H,H]\succeq0$ for any $X\in\textup{int}(K)$ and $H\in\mathbb{S}^{d}$ (Lemma \ref{['lem:logdet-sltsc']}).ASC (Lemma \ref{['lem:logdet-asc']}), and $d_{s}\,\nabla^{2}\phi$ is SASC. In analyzing $\phi$, we work in $\mathbb{R}^{d_{s}}=\mathbb{R}^{d(d+1)/2}$ and $\mathbb{S}^{d}$ simultaneously in the sequel, moving back and forth between them implicitly. We justify this identification as follows. We can define and work with the Lebesgue measure on $\mathbb{S}^{d}$ by identifying it with the Lebesgue measure on $\mathbb{R}^{d_{s}}$, where each component in the Lebesgue measure on $\mathbb{S}^{d}$ corresponds to each entry in the upper triangular part. Hence, with the Lebesgue measure $\mathrm{d} X$ on $\mathbb{S}^{d}$ it is straightforward to define a probability distribution on $\mathbb{S}^{d}$ whose probability density function with respect to $\mathrm{d} X$ is proportional to $\exp(-f)$ for a function $f:\mathbb{S}^{d}\to\mathbb{R}$. For instance, the uniform distribution over a region corresponds to $f$ being constant in the region and infinity outside of the region, and an exponential distribution to $f(X)=\langle C,X\rangle=\textup{Tr}(C^{\mathsf{T}}X)$ for $C\in\mathbb{S}^{d}$. A function $\phi:\mathbb{S}^{d}\to\mathbb{R}$ induces its counterpart $\psi:\mathbb{R}^{d_{s}}\to\mathbb{R}$ defined by $\psi(x)=\phi(X)$ for $x:=\textup{svec}(X)$. For symmetric matrices $\{H_{i}\}_{i\leq k}$, the $k$-th directional derivative of $\phi$ in directions $H_{1},\dots,H_{k}$ is $\mathrm{D}^{k}\phi(X)[H_{1},\cdots,H_{k}]\stackrel{\mathrm{{ def}}}{=}\frac{\mathrm{d}^{k}}{\mathrm{d} t_{k}\cdots\mathrm{d} t_{1}}\phi{\Bigl(X+\sum_{i=1}^{k}t_{i}H_{i}\Bigr)}\vert_{t_{1},\dots,t_{k}=0}\,.$ For $h_{i}:=\textup{svec}(H_{i})$, it follows that $\phi(X+\sum_{i=1}^{k}t_{i}H_{i})=\psi(x+\sum_{i=1}^{k}t_{i}h_{i})$ and thus $\mathrm{D}^{k}\phi(X)[H_{1},\cdots,H_{k}]=\mathrm{D}^{k}\psi(x)[h_{1},\cdots,h_{k}]\,.$ With this identification in hand, since the notion of (symmetric or strong) self-concordance is formulated in terms of directional derivatives, we can deal with both representations without having to specify one of them. We introduce three linear operators that enable us to make smooth transitions between $\mathbb{S}^{d}$ and $\mathbb{R}^{d_{s}}$. Let $E_{ij}=e_{i}e_{j}^{\mathsf{T}}\in\mathbb{R}^{d\times d}$ be the matrix with a single $1$ in the $(i,j)$ position and zeros elsewhere. $M:\mathbb{R}^{d_{s}}\to\mathbb{R}^{d^{2}}$ is the linear operator that maps $\textup{svec}(\cdot)$ to $\textup{vec}(\cdot)$ (i.e., $M\circ\textup{svec}=\textup{vec}$). It can be written as $M=\sum_{i\geq j}\textup{vec}(T_{ij})u_{ij}^{\mathsf{T}}$, where $T_{ij}\in\mathbb{R}^{d\times d}$ has all zero entries except for $1$ at $(i,j)$ and $(j,i)$ positions (i.e., $T_{ij}=E_{ij}+E_{ji}$ if $i\neq j$ and $E_{ij}$ if $i=j$), and $u_{ij}=\textup{svec}(E_{ij})$.$N:\mathbb{R}^{d^{2}}\to\mathbb{R}^{d^{2}}$ is the linear operator that maps $\textup{vec}(A)$ to $\textup{vec}{\bigl(\frac{1}{2}(A+A^{\mathsf{T}})\bigr)}$ for a matrix $A\in\mathbb{R}^{d\times d}$.$L:\mathbb{R}^{d_{s}}\to\mathbb{R}^{d^{2}}$ is the linear operator that maps $\textup{vec}(A)$ to $\textup{svec}(A)$ for a matrix $A\in\mathbb{R}^{d\times d}$. It can be written as $L=\sum_{i\geq j}u_{ij}\textup{vec}(E_{ij})^{\mathsf{T}}$. Let $M,N,L$ be matrices in Definition \ref{['def:linearOperators']}. (Lemma 2.1) $N=N^{\mathsf{T}}=N^{2}$ and $N(A\otimes A)=(A\otimes A)N$ for any $d\times d$ matrix $A$.(Lemma 3.5) $MLN=N$. We first examine properties of the metric defined by the Hessian of self-concordant barrier $\phi(X)=-\log\det X$ (see nesterov2003introductory for self-concordance). In this case, its Hessian and inverse have clean formulas. Let $\nabla_{X}^{2}\phi(X)=-\nabla_{x}^{2}\log\det(\textup{svec}^{-1}(x))\in\mathbb{R}^{d_{s}\times d_{s}}$ for $X\in\mathbb{S}_{+}^{d}$. Then, \nabla^{2}\phi(X)=M^{\mathsf{T}}(X^{-1}\otimes X^{-1})M=M^{\mathsf{T}}(X\otimes X)^{-1}M\,,{\bigl(\nabla^{2}\phi(X)\bigr)}^{-1}=M^{\dagger}(X\otimes X){\bigl(M^{\dagger}\bigr)}^{\mathsf{T}}=LN(X\otimes X)NL^{\mathsf{T}}\,, where $M^{\dagger}=(M^{\mathsf{T}}M)^{-1}M^{\mathsf{T}}\in\mathbb{R}^{d_{s}\times d^{2}}$ is the Moore-Penrose inverse of $M\in\mathbb{R}^{d^{2}\times d_{s}}$. We defer the proof to Appendix \ref{['app:matrixCalculus']}. We remark that as an immediate corollary to this, the local norm of $h\in\mathbb{R}^{d_{s}}$ with metric $\nabla^{2}\phi(X)$ is ${\|h\|}_{X}^{2}=\textup{svec}(H)^{\mathsf{T}}M^{\mathsf{T}}(X^{-1}\otimes X^{-1})M\textup{svec}(H)\underset{\text{(i)}}{=}\textup{Tr}(HX^{-1}HX^{-1})\eqqcolon{\|H\|}_{X}^{2}\,,$ where (i) follows from $\textup{vec}=M\circ\textup{svec}$ (Definition \ref{['def:linearOperators']}) and $\textup{Tr}(DB^{\mathsf{T}}A^{\mathsf{T}}C)=\textup{vec}(A)^{\mathsf{T}}(B\otimes C)\textup{vec}(D)$ (Lemma \ref{['lem:Kronecker']}). For $X\in K=\mathbb{S}_{+}^{d}$, the barrier $\phi(X)=-\log\det X$ is $d$-symmetric. For $X\in K$, pick any $Y\in K\cap(2X-K)$, and define a symmetric matrix $H:=Y-X$. Since $Y\in K$ and $2X-Y\in K$, we have $X+H\in K$ and $X-H\in K$. Thus, $-I\preceq X^{-1/2}HX^{-1/2}\preceq I\,,$ and the magnitude of each eigenvalue $\{\lambda_{i}\}_{i=1}^{d}$ of $X^{-1/2}HX^{-1/2}$ is bounded by $1$. Hence, ${\|H\|}_{X}^{2}=\textup{Tr}(X^{-1}HX^{-1}H)={\|X^{-1/2}HX^{-1/2}\|}_{F}^{2}\leq\sum_{i=1}^{d}\lambda_{i}^{2}\leq d\,.\qedhere$ Next, the convexity of the log-determinant of $\nabla^{2}\phi$ can be checked via properties of Kronecker products. See §\ref{['proof:psd-convex-ssc']} for the proof. $\log\det(\nabla^{2}\phi(\cdot))$ is convex. We move onto SSC of $d\phi(X)$. For $\psi_{X}:=\sup_{H\in\mathbb{S}^{d}}{\|(\nabla^{2}\phi(X))^{-1/2}\mathrm{D}^{3}\phi(X)[H]\,(\nabla^{2}\phi(X))^{-1/2}\|}_{F}/{\|H\|}_{X}$, we have $\sqrt{2(d+1)}\leq\psi_{X}\leq2\sqrt{d}\,.$ We present the proof in §\ref{['proof:psd-convex-ssc']}. This result informs us of the best possible scaling of $\phi$ that ensures SSC. Recall that if $g$ satisfies ${\|g^{-1/2}\mathrm{D} g[h]g^{-1/2}\|}_{F}\leq2\alpha{\|h\|}_{g}$ for $\alpha>0$, then $\alpha^{2}g$ is SSC. We remark that the scaling of $d$ is obviously better than the trivial scaling of $d_{s}=\Theta(d^{2})$. A function $d\phi$ is a strongly self-concordant barrier for $\mathbb{S}_{+}^{d}$. Moreover, the scaling factor of $d$ cannot be further improved. SLTSC of $\phi$ can be easily checked by noting $g(X)[H,H]=\textup{Tr}(X^{-1}HX^{-1}H)$ and using the chain rule. See the details in §\ref{['proof:psd-sltsc']}. $\mathrm{D}^{2}g(X)[H,H]\succeq0$ for any $X\in\textup{int}(K)$ and $H\in\mathbb{S}^{d}$. In establishing ASC, we find an interesting connection to a Gaussian orthogonal ensemble (GOE), one of the main objects studied in the random matrix theory. We prove the following lemmas and explain challenges when extending our arguments to SASC in §\ref{['proof:psd-asc']}. For $d_{s}=\frac{d(d+1)}{2}$ and $\textup{svec}(H)\sim\mathcal{N}{\bigl(0,\frac{r^{2}}{d_{s}}g(X)^{-1}\bigr)}$, $\frac{\sqrt{d_{s}d}}{r}X^{-1/2}HX^{-1/2}$ is a GOE. $-d\,\log\det X$ is ASC. Consider $Q_{1}=\{(x,t)\in\mathbb{R}^{2}:-\log x\leq t,x>0\}$. As $f(\cdot)=-\log(\cdot)$ is convex on $\mathbb{R}_{+}$ and satisfies the condition in Lemma \ref{['lem:tool-convex']} with $\beta=2$ and $\gamma=6$, $F(x,t)=-\log(t+\log x)-36\log x$ is a highly $37$-self concordant barrier for $Q_{1}$. Therefore, $2F$ is SSC and SLTSC with $\bar{\nu}=\mathcal{O}(1)$. Consider the direct product of level sets $K=\prod_{i=1}^{d}\{(x_{i},t_{i})\in\mathbb{R}^{2}:-\log x_{i}\leq t_{i},\,x_{i}>0\}\,,$ and let $\phi(x,t)=-\sum_{i=1}^{d}{\bigl(\log(t_{i}+\log x_{i})+36\log x_{i}\bigr)}$ and $g=2\nabla^{2}\phi$. $\nu,\,\bar{\nu}=\mathcal{O}(d)$.SSC and SLTSC.$d\,\nabla^{2}\phi$ is SASC. For $i\in[d]$, let $Q_{i}=\{(x_{i},t_{i})\in\mathbb{R}^{2}:-\log x_{i}\leq t_{i},\,y_{i}>0\}$ and $F_{i}(x_{i},t_{i})$ be the self-concordant barrier above. Note that $2F_{i}$ is SSC and SLTSC. By Lemma \ref{['lem:ssc-direct']} and \ref{['lem:sltsc-direct']}, the Hessian of $F(x,t):=2\sum_{i=1}^{d}F_{i}(x_{i},t_{i})$ is SSC and SLTSC. The last item on SASC follows from Lemma \ref{['lem:hsc-to-sasc']}. Consider $Q_{2}=\{(x,t)\in\mathbb{R}^{2}:e^{x}\leq t\}=\{(x,t)\in\mathbb{R}^{2}:t>0,\,x\leq\log t\}$. As $f(t)=\log t$ is concave and satisfies the condition in Lemma \ref{['lem:tool-concave']} with $\beta=2$ and $\gamma=6$, $F(x,t)=-\log(\log t-x)-36\log t$ is a highly $37$-self concordant barrier for $Q_{2}$. Therefore, $2F$ is SSC and SLTSC with $\bar{\nu}=\mathcal{O}(1)$. Consider the direct product of level sets $K=\prod_{i=1}^{d}\{x_{i},t_{i})\in\mathbb{R}^{2}:\exp(x_{i})\leq t_{i}\}\,,$ and let $\phi(x,t)=-\sum_{i=1}^{d}(\log(\log t_{i}-x_{i})+36\log t_{i})$ and $g=2\nabla^{2}\phi$. $\nu,\,\bar{\nu}=\mathcal{O}(d)$.SSC and SLTSC.$d\,\nabla^{2}\phi$ is SASC. For $i\in[d]$, let $Q_{i}=\{(x_{i},t_{i})\in\mathbb{R}^{2}:e^{x_{i}}\leq t_{i}\}$ and $F_{i}(x_{i},t_{i})$ be the self-concordant barrier above. Note that $2F_{i}$ is SSC and SLTSC. By Lemma \ref{['lem:ssc-direct']} and \ref{['lem:sltsc-direct']}, the Hessian of $F(x,t):=2\sum_{i=1}^{d}F_{i}(x_{i},t_{i})$ is SSC and SLTSC. The last item on SASC follows from Lemma \ref{['lem:hsc-to-sasc']}. Consider $Q_{3}=\{(x,t)\in\mathbb{R}^{2}:x\geq0,\,t\geq x\log x\}$. Note that $f(x)=x\log x$ is convex on $\{x>0\}$ and satisfies the condition in Lemma \ref{['lem:tool-convex']} with $\beta=1$ and $\gamma=2$. Hence, $F(x,t)=-\log(t-x\log x)-36\log x$ is a highly $5$-self concordant barrier for $Q_{3}$. Therefore, $2F$ is SSC and SLTSC with $\bar{\nu}=\mathcal{O}(1)$. Consider the direct product of level sets $K=\prod_{i=1}^{d}\{(x_{i},t_{i})\in\mathbb{R}^{2}:x_{i}\geq0,\,t_{i}\geq x_{i}\log x_{i}\}\,,$ and let $\phi(x,t)=-\sum_{i=1}^{d}{\bigl(\log(t_{i}-x_{i}\log x_{i})+36\log x_{i}\bigr)}$ and $g=2\nabla^{2}\phi$. $\nu,\,\bar{\nu}=\mathcal{O}(d)$.SSC and SLTSC.$d\,\nabla^{2}\phi$ is SASC. For $i\in[d]$, let $Q_{i}=\{(x_{i},t_{i})\in\mathbb{R}^{2}:x_{i}\geq0,\,t_{i}\geq x_{i}\log x_{i}\}$ and $F_{i}(x_{i},t_{i})$ be the self-concordant barrier above. Note that $2F_{i}$ is SSC and SLTSC. By Lemma \ref{['lem:ssc-direct']} and \ref{['lem:sltsc-direct']}, the Hessian of $F(x,t):=2\sum_{i=1}^{d}F_{i}(x_{i},t_{i})$ is SSC and SLTSC. The last item on SASC follows from Lemma \ref{['lem:hsc-to-sasc']}. We start with the power functions. For $p\geq1$, consider $Q_{4}=\{(x,t)\in\mathbb{R}^{2}:t\geq\max(0,x)^{p}\}=\{(x,t)\in\mathbb{R}^{2}:t\geq0,\,x\leq t^{1/p}\}$. Note that $f(t)=t^{1/p}$ is concave on $t>0$ and satisfies the condition in Lemma \ref{['lem:tool-concave']} with $\beta=2$ and $\gamma=6$. Hence, $F_{4}(x,t)=-\log(t^{1/p}-x)-36\log t$ is a highly $37$-self-concordant barrier for $Q_{4}$. Similarly, $F_{5}(t,x)=-\log(t^{1/p}+x)-36\log t$ is a highly $37$-self concordant barrier for the convex set $Q_{5}=\{(x,t)\in\mathbb{R}^{2}:t\geq\max(0,-x)^{p}\}$. Since the convex set $Q_{6}=\{(x,t)\in\mathbb{R}^{2}:t\geq|x|^{p}\}$ is equal to $Q_{4}\cap Q_{5}$, the sum of $F_{4}+F_{5}$, which is $F_{6}(x,t)=-\log(t^{2/p}-x^{2})-72\log t$ is a highly $72$-self-concordant barrier for $Q_{6}$. Hence, $2F$ is SSC and SLTSC with $\bar{\nu}=\mathcal{O}(1)$. Consider the direct product of level sets $K=\prod_{i=1}^{d}\{(x_{i},t_{i})\in\mathbb{R}^{2}:\left\lvert x_{i}\right\rvert ^{p}\leq t_{i}\}$, and let $\phi(x,t)=-\sum_{i=1}^{d}{\bigl(\log(t_{i}^{2/p}-x_{i}^{2})+72\log t_{i}\bigr)}$ and $g=2\nabla^{2}\phi$. $\nu,\,\bar{\nu}=\mathcal{O}(d)$.SSC and SLTSC.$d\,\nabla^{2}\phi$ is SASC. Consider a highly $72$-self-concordant barrier $F_{i}$ above for $\{(x_{i},t_{i}):|x_{i}|^{p}\leq t_{i}\}$ for $i\in[d]$. Note that $2F_{i}$ is SSC and SLTSC. By Lemma \ref{['lem:ssc-direct']} and \ref{['lem:sltsc-direct']}, the Hessian of $F(x,t):=2\sum_{i=1}^{d}F_{i}(x_{i},t_{i})$ is SSC and SLTSC. The last item on SASC follows from Lemma \ref{['lem:hsc-to-sasc']}. For given constraints and epigraphs, combining metrics for them (according to the self-concordance theory for sampling developed in §\ref{['sec:sc-theory-rules']}) and employing $\mathsf{GCDW}$ with the combined metric lead to a poly-time mixing sampling algorithm. Compared to the state-of-the-art poly-time mixing algorithm, the $\mathsf{Ball\ walk}$, $\mathsf{GCDW}$ offers several advantages. First, it does not require any preprocessing (e.g., rounding) due to affine invariance. Also, it achieves faster mixing by leveraging inherent geometric information in sampling problems. The per-step complexity of $\mathsf{Dikin\ walks}$, however, is in general higher than that of the $\mathsf{Ball\ walk}$. The primary computational bottleneck lies in computing the inverse of a local metric. Nevertheless, efficient implementation of inverse maintenance can significantly reduce the per-step complexity, improving the total complexity ($\#\,$iterations needed for mixing times the per-step complexity). In this section, we illustrate how our framework recovers theoretical guarantees of previous work on $\mathsf{Dikin\ walks}$ for uniform sampling and extends beyond uniform sampling. In particular, we show that $\mathsf{GCDW}$ is a poly-time mixing algorithm capable of sampling uniform, exponential, or Gaussian distributions on second-order cones or truncated PSD cones. Additionally, we illustrate an efficient per-step implementation that yields a faster total complexity when compared to general-purpose samplers such as the $\mathsf{Ball\ walk}$. Consider a set of linear constraints given by $K=\{x\in\mathbb{R}^{d}:Ax\geq b\}$ with $A\in\mathbb{R}^{m\times d}$ and $b\in\mathbb{R}^{m}$. kannan2012random first studied the $\mathsf{Dikin\ walk}$ for uniformly sampling a polytope, where a local metric is set to be the Hessian of the logarithmic barrier, $g=\nabla^{2}\phi_{\textsf{log}}=A_{(\cdot)}^{\mathsf{T}}A_{(\cdot)}$. They showed that the $\mathsf{Dikin\ walk}$ with the log-barrier mixes in $\mathcal{O}{\bigl(md\log\frac{M}{\varepsilon}\bigr)}$ iterations with a warmness parameter $M$. An immediate consequence of our work is that $\mathsf{GCDW}$ achieves the mixing time of $\widetilde{\mathcal{O}}(md)$ without a warmness assumption, as $\bar{\nu},\nu=m$ and $g$ is SSC, LTSC, and ASC by Lemma \ref{['lem:log-barrier']}. chen2018fast introduced the $\textsf{Vaidya walk}$ and the $\textsf{Approximate John walk}$, which are essentially $\mathsf{Dikin\ walks}$ with the Vaidya metric $\nabla^{2}\phi_{\textsf{Vaidya}}$ and a version of the Lewis-weight metric $\sqrt{d}\,\nabla^{2}\phi_{\textsf{Lw}}$. Their work showed that both walks achieves mixing times of $\mathcal{O}{\bigl(\sqrt{m}d^{3/2}\log\frac{M}{\varepsilon}\bigr)}$ and $\mathcal{O}{\bigl(d^{5/2}\log^{\mathcal{O}(1)}m\,\log\frac{M}{\varepsilon}\bigr)}$, respectively. Building upon our analysis of the Vaidya metric and Lewis-weight metric in Lemma \ref{['lem:vaidya']} and \ref{['lem:Lewis-weight']}, we find that $\mathsf{GCDW}$ with these metrics achieves the same mixing but without any warmness assumption. We note that for the same task the $\mathsf{Ball\ walk}$ without a warm start requires $\widetilde{\mathcal{O}}(d^{3})$ membership queries due to kannan1997randomjia2021reducing. Given that a membership query involves $\mathcal{O}(md)$ arithmetic operations, the total complexity of the $\mathsf{Ball\ walk}$ is $\widetilde{\mathcal{O}}(md^{4})$. In contrast, the per-step of the $\mathsf{Dikin\ walk}$ with the log-barrier can be run in $\mathcal{O}(md^{\omega-1})$ operations through the fast matrix multiplication, so the total number of arithmetic operations is $\widetilde{\mathcal{O}}(m^{2}d^{\omega})$. Thus, for $m$ close to $d$ $\mathsf{GCDW}$ is provably faster than the $\mathsf{Ball\ walk}$. When an efficient inverse maintenance proposed in laddha2020strong is employed, the per-step complexity can be improved to $\mathcal{O}(d^{2}+\textup{nnz}(A))=\mathcal{O}(md)$. In such cases $\mathsf{GCDW}$ is faster in a broader range of $m$. In particular, if $A$ is as sparse as $\textup{nnz}(A)=\mathcal{O}(d^{2})$, then $\mathsf{GCDW}$ is always faster than the $\mathsf{Ball\ walk}$. Moreover, $\mathsf{GCDW}$ with the Lewis-weight metric mixes in $\widetilde{\mathcal{O}}(d^{2.5})$ steps with the per-step complexity of $\widetilde{\mathcal{O}}(md^{\omega-1})$, so it is always faster than the $\mathsf{Ball\ walk}$ for any $m$. The current mixing bound of the $\mathsf{Ball\ walk}$ for general log-concave sampling is $\widetilde{\mathcal{O}}(d^{4})$ due to lovasz2007geometry. On the other hand, the $\mathsf{Dikin\ walk}$ employed with any metric above for exponential sampling converges in the same iterations as the $\mathsf{Dikin\ walk}$ for uniform sampling. Since only difference between two sapling is the additional term of $\exp{\bigl(-(f(z)-f(x))\bigr)}$ in the Metropolis filter, the fast implementation techniques mentioned earlier can be applied to the context of exponential sampling. As a result, for the exponential sampling each of the $\mathsf{Dikin\ walks}$ described above surpasses the $\mathsf{Ball\ walk}$ by a larger margin. For Gaussian sampling over a polytope, we first reduce it to the exponential sampling as in \ref{['eq:reduced-problem']}: for $y=(x,t)\in\mathbb{R}^{d+1}$ \text{sample }y\sim\tilde{\pi}\propto\exp(-t)\text{s.t. }Ax\geq b,\ \frac{1}{2}{\|x-\mu\|}_{\Sigma}^{2}\leq t\,. According to our theory, it is natural to use the metric given by $g(x,t)=2\left[\nabla^{2}_{x}\phi_{\text{log}}(x)0\right]+2(d+1)\,\nabla^{2}_{(x,t)}\phi_{\text{Gauss}}(x,t)\,,$ which is ${\bigl(\mathcal{O}(m+d),\mathcal{O}(m+d)\bigr)}$-Dikin-amenable due to Lemma \ref{['lem:Gaussian-potential']}. Thus, $\mathsf{GCDW}$ needs $\widetilde{\mathcal{O}}(d(m+d))$ iterations of the $\mathsf{Dikin\ walk}$. We note that the log-barrier can be replaced by the Vaidya or Lewis-weight metrics, and in such cases one can obtain provable guarantees on the mixing time by computing $\nu$ and $\bar{\nu}$, referring to §\ref{['sec:handbook-barrier']} or Table \ref{['tab:scaling-table']}. We consider a region given by ${\|x-\mu\|}_{\Sigma}\leq t$ and $A\left[xt\right]^{\mathsf{T}}\leq b$ for $A\in\mathbb{R}^{m\times(d+1)},b\in\mathbb{R}^{m},$ $\mu\in\mathbb{R}^{d}$, and $\Sigma\in\mathbb{S}_{++}^{d}$. In this case, our self-concordance theory suggests using $\nabla^{2}(2\sqrt{d+1}\phi_{\text{Lw}}+2(d+1)\phi_{\text{SOC}})\quad\text{or}\quad\nabla^{2}(2\phi_{*}+2(d+1)\,\phi_{\text{SOC}})\ \text{for }*=\text{log, Vaidya}\,,$ to deal with the truncated SOC constraint. For the log-barrier case, this yields an ${\bigl(\mathcal{O}(m+d),\mathcal{O}(m+d)\bigr)}$-Dikin-amenable metric due to Lemma \ref{['lem:soc']}, with which $\mathsf{GCDW}$ requires $\widetilde{\mathcal{O}}(d(m+d))$ iterations of the $\mathsf{Dikin\ walk}$. Following the reduction as in the polytope sampling, we should use $g(x,t,t')=3\left[\nabla^{2}_{(x,t)}\phi_{\textup{log}}(x,t)+(d+1)\,\nabla^{2}_{(x,t)}\phi_{\textup{SOC}}(x,t)0\right]+3(d+2)\,\nabla^{2}_{(x,t,t')}\phi_{\textup{Gauss}}(x,t,t'),$ which is ${\bigl(\mathcal{O}(m+d),\mathcal{O}(m+d)\bigr)}$-Dikin-amenable, and thus $\mathsf{GCDW}$ needs $\widetilde{\mathcal{O}}(d(m+d))$ iterations of the $\mathsf{Dikin\ walk}$. For a matrix $X\in\mathbb{R}^{d\times d}$, recall that $\textup{vec}(X)\in\mathbb{R}^{d^{2}}$ denotes the vector obtained by stacking columns of $X$ vertically. Additionally, we define $A\in\mathbb{R}^{m\times d^{2}}$, $S_{X}\in\mathbb{R}^{m\times m}$, and $A_{X}\in\mathbb{R}^{m\times d^{2}}$ by $A:=\left[\textup{vec}(A_{1})\cdots\textup{vec}(A_{m})\right]^{\mathsf{T}},\quad S_{X}:=\textup{Diag}(\langle A_{i},X\rangle-b_{i}),\quad A_{X}:=S_{X}^{-1}A\,,$ where we assume $A$ has no all-zero rows and $(S_{X})_{ii}>0$ for $i\in[m]$. The metric below comes from the Hessian of $-2d^{2}\,\log\det X-2\sum_{i=1}^{m}\log(\langle A_{i},X\rangle-b_{i})\,.$ Here the first term, the log-determinant, serves as a barrier for the PSD cone while the second term is the standard logarithmic barrier for linear constraints. We note that the $-\log\det X$ is strictly convex on $x\in\textup{int}(K)$ for $K$ the truncated PSD cone, so all metrics $g$ introduced in our main results are positive definite. Thus, the $\mathsf{Dikin\ walk}$ with those $g$ is well-defined. Let $K$ be the truncated PSD cone and $g$ be the local metric such that at each $X\in\textup{int}(K)$, for symmetric matrices $H_{1},H_{2}$, $g_{X}(H_{1},H_{2})=2d^{2}\textup{Tr}(X^{-1}H_{1}X^{-1}H_{2})+2\,\textup{vec}(H_{1})^{\mathsf{T}}A_{X}^{\mathsf{T}}A_{X}\textup{vec}(H_{2})\,.$ Then $\mathsf{GCDW}$ needs $\widetilde{\mathcal{O}}((d^{3}+m)d^{2})$ steps of the $\mathsf{Dikin\ walk}$ with the local metric $g$, where each step runs in $\mathcal{O}{\bigl((md^{\omega}+m^{2}d^{2})\wedge(d^{2\omega}+md^{2(\omega-1)})\bigr)}$ time. Since $g_{X}$ is ${\bigl(\mathcal{O}(m+d^{3}),\mathcal{O}(m+d^{3})\bigr)}$-Dikin-amenable by Lemma \ref{['lem:psd']}, $\mathsf{GCDW}$ requires ${\widetilde{\mathcal{O}}(d^{2}\,(d^{3}+m))}$ iterations of the $\mathsf{Dikin\ walk}$. As mentioned earlier, efficient maintenance of the inverse of a metric function could lead to a faster per-step complexity. As an example, we provide such an implementation of Proposition \ref{['thm:basicPSD']} in §\ref{['subsec:oracle-implementation']}. Putting these together, for an interesting regime of $m=\mathcal{O}(1)$, $\mathsf{GCDW}$ is faster than the $\mathsf{Ball\ walk}$ by a factor of $d$ in terms of the total complexity. If we replace the log-barrier by the Vaidya metric, then the dependence on $m$ is improved to $\sqrt{m}$ as in the polytope sampling. See §\ref{['proof:Algorithms-for-PSD']} for the proofs of the two claims below. Let $K$ be the truncated PSD cone and $g$ be the local metric such that at each $X\in\textup{int}(K)$, for symmetric matrices $H_{1},H_{2}$, $g_{X}(H_{1},H_{2})=2d^{2}\textup{Tr}(X^{-1}H_{1}X^{-1}H_{2})+44\sqrt{\frac{m}{d}}\,\textup{vec}(H_{1})^{\mathsf{T}}A_{X}^{\mathsf{T}}{\bigl(\Sigma_{X}+\frac{d}{m}I_{m}\bigr)}A_{X}\textup{vec}(H_{2})\,.$ Then $\mathsf{GCDW}$ needs $\widetilde{\mathcal{O}}((d^{2}+\sqrt{m})d^{3})$ steps of the $\mathsf{Dikin\ walk}$ with the local metric $g$, with each step running in $\widetilde{\mathcal{O}}(md^{2(\omega-1)})$ amortized time. Lastly, the dependence on $m$ can be made poly-logarithmic by working with the Lewis-weight metric. We remark that for uniform sampling the total complexity of $\mathsf{GCDW}$ is less than that of the $\mathsf{Ball\ walk}$ by the order of $d^{5-2\omega}$. Let $K$ be the truncated PSD cone and $g$ be the local metric such that at each $X\in\textup{int}(K)$, for symmetric matrices $H_{1},H_{2}$, $g_{X}(H_{1},H_{2})=2d^{2}\textup{Tr}(X^{-1}H_{1}X^{-1}H_{2})+dc_{1}(\log m)^{c_{2}}\,\textup{vec}(H_{1})^{\mathsf{T}}A_{X}^{\mathsf{T}}W_{X}A_{X}\textup{vec}(H_{2})\,,$ where $W_{X}$ is the diagonalized $\ell_{p}$-Lewis weight of $A_{X}$ with $p=\mathcal{O}(\log m)$, and $c_{1},c_{2}>0$ are universal constants. Then $\mathsf{GCDW}$ requires $\widetilde{\mathcal{O}}(d^{5})$ steps of the $\mathsf{Dikin\ walk}$, with each step running in $\widetilde{\mathcal{O}}(md^{2(\omega-1)})$ amortized time. Just as in polytope or second-order cone sampling, we introduce a new variable $t$ by replacing a quadratic term in the potential. This reduces the Gaussian sampling problem to an exponential sampling problem. We then work with a local metric $g(X,t)=3\left(d\left[\nabla^{2}_{X}\phi_{\textup{Lw}}(X)0\right]+d^{2}\left[\nabla^{2}_{X}\phi_{\textup{PSD}}(X)0\right]+d^{2}\nabla^{2}_{(X,t)}\phi_{\textup{Gauss}}(X,t)\right)\,,$ which is ${\bigl(\mathcal{O}^{*}(d^{3}),\mathcal{O}^{*}(d^{3})\bigr)}$-Dikin-amenable. Thus, $\mathsf{GCDW}$ needs $\widetilde{\mathcal{O}}(d^{5})$ iterations of the $\mathsf{Dikin\ walk}$ with the local metric $g$, and the per-step complexity remains $\widetilde{\mathcal{O}}(md^{2(\omega-1)})$ in amortized time. Now we design an oracle that implements each iteration of the $\mathsf{Dikin\ walk}$ (Algorithm \ref{['alg:DikinWalk']}). This can be implemented as follows: when the current point is $x$, Sample $z\sim\mathcal{N}{\bigl(0,\frac{r^{2}}{d}g(x)^{-1}\bigr)}$.Compute $y=x+g(x)^{-1/2}z$ and propose it.Accept $y$ with probability $1\wedge{\bigl(\sqrt{\frac{\det g(y)}{\det g(x)}}\,\frac{\exp f(x)}{\exp f(y)}\bigr)}$. We provide two algorithms with the complexity of $\mathcal{O}(md^{\omega}+m^{2}d^{2})$ and $\mathcal{O}(d^{2\omega}+md^{2(\omega-1)})$. We can implement each iteration in $\mathcal{O}{\bigl((md^{\omega}+m^{2}d^{2})\wedge(d^{2\omega}+md^{2(\omega-1)})\bigr)}$ time by using the former for small $m$ and the latter for large $m$. This completes the second half of Theorem \ref{['thm:basicPSD']}. For simplicity here, we ignore the constant factors of $g=g_{1}+g_{2}$, where $g_{1}(X)=M^{\mathsf{T}}(X\otimes X)^{-1}M=:BB^{\mathsf{T}}\qquad\text{and}\qquad g_{2}(X)=M^{\mathsf{T}}A^{\mathsf{T}}S_{X}^{-2}AM=:UU^{\mathsf{T}}\,.$ where $B:=M^{\mathsf{T}}(X\otimes X)^{-1/2}\in\mathbb{R}^{d_{s}\times d^{2}}$ and $U:=M^{\mathsf{T}}A^{\mathsf{T}}S_{X}^{-1}\in\mathbb{R}^{d_{s}\times m}$. Letting $u_{i}$ be the $i$-th column of $U$ for $i\in[m]$, we note that $g_{2}=\sum_{i=1}^{m}u_{i}u_{i}^{\mathsf{T}}$. We start with a subroutine for computing $g(X)^{-1}v$ for given $v\in\mathbb{R}^{d_{s}}$ in $\mathcal{O}(md^{\omega}+m^{2}d^{2})$ time. Computation of $g(X)^{-1}v$ Input: $X\in\mathbb{S}_{+}^{d}$, vector $v\in\mathbb{R}^{d_{s}}$, local metric $g$. Output: $g(X)^{-1}v$ Prepare the column vectors $u_{i}$ of $U=M^{\mathsf{T}}A^{\mathsf{T}}S_{X}^{-1}$. For $\bar{g}_{0}:=g_{1}(X)$, compute $\bar{g}_{0}^{-1}v$ and $\bar{g}_{0}^{-1}u_{i}$ for $i\in[m]$. $i=1,\cdots,m$ Compute $\bar{g}_{i}^{-1}v$ and $\bar{g}_{i}^{-1}u_{j}$ for $j\in[m]$, according to $\bar{g}_{i}^{-1}w=\bar{g}_{i-1}^{-1}w-\frac{\bar{g}_{i-1}^{-1}u_{i}\cdot u_{i}^{\mathsf{T}}\bar{g}_{i-1}^{-1}w}{1+u_{i}^{\mathsf{T}}\bar{g}_{i-1}^{-1}u_{i}}\,.$ Output $\bar{g}_{m}^{-1}v$. Algorithm \ref{['alg:subroutine']} computes $g(X)^{-1}v$ in $\mathcal{O}(md^{\omega}+m^{2}d^{2})$ time for a query $v\in\mathbb{R}^{d_{s}}$. See §\ref{['proof:eff_implement']} for the proof. With this subroutine in hand, we proceed to an efficient implementation of two tasks -- computation of (1) $g(x)^{-\frac{1}{2}}z$ for a given vector $z\in\mathbb{R}^{d_{s}}$ and (2) $\sqrt{\frac{\det g(y)}{\det g(x)}}\,\frac{\exp f(x)}{\exp f(y)}$. Implementation of the $\mathsf{Dikin\ walk}$ Input: current point $X\in\mathbb{S}_{+}^{d}$, local metric $g$ Step 1: Sampling from $\mathcal{N}{\bigl(0,\frac{r^{2}}{d}g(X)^{-1}\bigr)}$ Draw $w\sim\mathcal{N}(0,I_{d^{2}+m})$ and $v\gets g(X)^{-1}\left[BU\right]w$ by Algorithm \ref{['alg:subroutine']}. Propose $y\gets\textup{svec}(X)+\frac{r}{\sqrt{d}}v$. Step 2: Computation of acceptance probability Use Algorithm \ref{['alg:subroutine']} to prepare $\{\bar{g}_{i}^{-1}u_{1},\dots,\bar{g}_{i}^{-1}u_{m}\}_{i=0}^{m}$ at $X$ and $Y:=\textup{svec}^{-1}(y)$. $\det\bar{g}_{0}(\cdot)\gets2^{d(d-1)/2}(\det(\cdot))^{-(d+1)}$ ($\because$ Lemma \ref{['lem:Kronecker']}-7) $i=1,\cdots,m$ $\det(\bar{g}_{i+1})\gets\det\bar{g}_{i}\cdot(1+u_{i+1}^{\mathsf{T}}\bar{g}_{i}^{-1}u_{i+1})$. Accept $Y$ with probability $1\wedge{\bigl(\sqrt{\frac{\det\bar{g}_{m}(Y)}{\det\bar{g}_{m}(X)}}\,\frac{\exp f(X)}{\exp f(Y)}\bigr)}$. Algorithm \ref{['alg:perStep-small-m']} implements the $\mathsf{Dikin\ walk}$ with per-step complexity of $\mathcal{O}(md^{\omega}+m^{2}d^{2})$. The algorithm right above has quadratic dependence on the number $m$ of constraints, which could become expensive for large $m$. In this regime, we just fully compute the whole matrix function of size $\mathbb{R}^{d_{s}\times d_{s}}$, which takes $\mathcal{O}(d^{2\omega}+md^{2(\omega-1)})$ time, and computing its inverse, square-root, and determinant takes $\mathcal{O}(d^{2\omega})$ time. When implementing the $\mathsf{Dikin\ walk}$ with the Lewis-weights metric, we use an approximation algorithm presented in lee2019solving for computing and updating the Lewis weight, which ensures $(1-\delta)\widetilde{W}_{X}\preceq W_{X}\preceq(1+\delta)\widetilde{W}_{X}$ for the approximate Lewis weights $\widetilde{W}_{X}$ and a target accuracy parameter $\delta$ (note that the initialization and update times of the Lewis weight above hide poly-logarithmic dependence on $\log(1/\delta)$). Strictly speaking, we should check that these approximate Lewis weights do not affect the theoretical guarantees above. To see this, let us define $\widetilde{g}=2(dg_{1}+\widetilde{g}_{2})$, where for some constants $c_{1},c_{2}>0$ $g_{1}(X)=d^{2}M^{\mathsf{T}}(X\otimes X)^{-1}M\qquad\text{and}\qquad\widetilde{g}_{2}=dc_{1}\left(\log m\right)^{c_{2}}M^{\mathsf{T}}A_{X}^{\mathsf{T}}\widetilde{W}_{X}A_{X}M\,.$ First of all, the $\mathsf{Dikin\ walk}$ with $\widetilde{g}$ still converges to a target distribution, since the approximation algorithm in lee2019solving is deterministic and thus the condition of detailed balance still holds under the acceptance probability of $1\wedge{\bigl(\sqrt{\frac{\det\tilde{g}(Y)}{\det\tilde{g}(X)}}\,\frac{\exp f(X)}{\exp f(Y)}\bigr)}$. For $\widetilde{P}_{X}$ the one-step distribution of the $\mathsf{Dikin\ walk}$ started at $X$ with $\widetilde{g}$, we can show one-step coupling similar to Lemma \ref{['lem:one-step']}, following the overall proof therein and taking $\delta=1/\text{poly}(d)$ small enough. See §\ref{['proof:Handling-approximate-Lewis']} for the proof. For convex $K\subset\mathbb{R}^{d}$, let $g:\textup{int}(K)\to\mathbb{S}_{++}^{d}$ be SSC, ASC, LTSC, and $\phi:\textup{int}(K)\to\mathbb{R}$ be its function counterpart. Suppose that the potential $f$ of the target distribution $\pi$ is $\beta$-relatively smooth in $\phi$. Then there exist constants $s_{1},s_{2}>0$ such that if ${\|x-y\|}_{g(x)}\leq s_{1}r/\sqrt{d}$ with $r=s_{2}\,(1\wedge1/\sqrt{\beta})$ for $x,y\in\textup{int}(K)$, then $d_{\textrm{TV}}(\widetilde{P}_{x},\widetilde{P}_{y})\leq\frac{3}{4}+0.01$. We collect deferred proofs in this section. We start with the one-step coupling of the $\mathsf{Dikin\ walk}$ under the setting $\alpha\nabla^{2}\phi\preceq\nabla^{2} f\preceq\beta\nabla^{2}\phi$ on $\textup{int}(K)$. Roughly speaking, if ${\|x-y\|}_{x}\leq r/\sqrt{d}$ with $r\lesssim1\wedge1/\sqrt{\beta}$, then $d_{\textrm{TV}}(P_{x},P_{y})\leq0.99$. For $\pi\propto\exp(-f)\cdot\mathbf{1}_{K}$ and $z\sim\mathcal{N}(x,\frac{r^{2}}{d}g(x)^{-1})$, let us denote $p_{x}=\mathcal{N}{\Bigl(x,\frac{r^{2}}{d}g(x)^{-1}\Bigr)},\qquad R(x,z)=\frac{p_{z}(x)}{p_{x}(z)}\frac{\pi(z)}{\pi(x)},\qquad A(x,z)=\min{\bigl(1,R(x,z)\,\mathbf{1}_{K}(z)\bigr)}\,.$ The transition kernel $P(x,\cdot)$ of the $\mathsf{Dikin\ walk}$ started at $x$ can be written as $P(x,dz)=\underbrace{{\bigl(1-\mathbb{E}_{p_{x}}[A(x,\cdot)]\bigr)}}_{\eqqcolon r_{x}}\,\delta_{x}(\mathrm{d} z)+A(x,z)\,p_{x}(\mathrm{d} z)\,.$ Thus, for $x,y\in\textup{int}(K)$, d_{\textrm{TV}}(P_{x},P_{y})=\underbrace{\frac{1}{2}(r_{x}+r_{y})}_{\mathsf{I}}+\underbrace{\frac{1}{2}\int|A(x,z)\,p_{x}(z)-A(y,z)\,p_{y}(z)|\,\mathrm{d} z}_{\mathsf{II}}\,. Let $h\sim\mathcal{N}(0,I_{d})$ and denote a bad event $B_{0}=\{z\in\mathbb{R}^{d}:{\|z-x\|}_{x}\ge cr\}$ with $c$ determined later. Due to ${\|z-x\|}_{x}=\frac{r}{\sqrt{d}}{\|h\|}$ (in law) and concentration of the standard Gaussian in a thin shell of radius $\sqrt{d}$ with annulus $\mathcal{O}(1)$, we have $\mathbb{P}_{z}(B_{0})=\mathbb{P}_{h}({\|h\|}\geq c\sqrt{d})\leq\exp{\bigl(-(c-1)\sqrt{d}/2\bigr)}$. Hence, $\mathbb{P}(B_{0})\leq\varepsilon$ for $c\geq1+\sqrt{\frac{2}{d}\,\log\frac{1}{\varepsilon}}$. Note that $r_{x}=1-\mathbb{E}_{p_{x}}[A(x,z)]=1-\int\min{\Bigl(1,\,\underbrace{\mathbf{1}_{K}(z)\frac{\exp f(x)}{\exp f(z)}}_{\eqqcolon\textsf{A}}\underbrace{\frac{p_{z}(x)}{p_{x}(z)}}_{\eqqcolon\textsf{B}}\Bigr)}p_{x}(\mathrm{d} z)\,.$ As for $\textsf{A}$, we let $\nabla^{2}\phi\preceq c_{\phi}g$ for some $c_{\phi}>0$ and use Taylor's expansion at $x\in K\cap B_{0}^{c}$ to show that for some $x^{*}\in[x,z]$, f(x)-f(z)+\nabla f(x)^{\mathsf{T}}(z-x)=-{\|z-x\|}_{\nabla^{2} f(x^{*})}^{2}\geq-c_{\phi}\beta\,{\|z-x\|}_{g(x^{*})}^{2}\underset{\text{(i)}}{\geq}-c_{\phi}\beta\,{\|z-x\|}_{x}^{2}\cdot(1+2{\|x-z\|}_{x})^{2}\geq-c_{\phi}\beta c^{2}r^{2}(1+2cr)^{2}\underset{\text{(ii)}}{\geq}-\varepsilon\,, where we used Lemma \ref{['lem:scCloseness']} in (i) and took $r\leq r_{1}(\varepsilon)$ in (ii), which is defined so that $\beta c_{\phi}c^{2}r^{2}(1+cr)^{2}\leq\varepsilon$ for any $r\leq r_{1}(\varepsilon)$. It follows from $\mathcal{D}_{g}^{1}(x)\subset K$ and symmetry of $\mathcal{N}_{g}^{r}(x)$ that there exists a half-ellipsoid $G\subset\mathcal{D}_{g}^{1}(x)$ in which $\langle\nabla f(x),z-x\rangle\leq0$. Thus, $f(x)-f(z)\geq-\varepsilon$ holds on $z\in G$. For a bad event $B_{1}:=G^{c}$, it holds that $\mathbb{P}_{z}(B_{1})\leq\frac{1}{2}+\mathbb{P}_{z}{\bigl(\mathcal{D}_{g}^{1}(x)^{c}\bigr)}=\frac{1}{2}+\mathbb{P}_{z}({\|z-x\|}_{x}\geq1)=\frac{1}{2}+\mathbb{P}_{h}{\Bigl({\|h\|}\geq\frac{\sqrt{d}}{r}\Bigr)}\leq\frac{1}{2}+\varepsilon\,,$ where the last inequality follows from concentration of $h$ for any $r\leq r_{2}(\varepsilon):={\bigl(1+\frac{2}{\sqrt{d}}\,\log\frac{1}{\varepsilon}\bigr)}^{-1}$. As for $\textsf{B}$, for $\varphi(x):=\frac{1}{2}\log\det g(x)$ we have $\log\text{B}=-\frac{d}{2r^{2}}{\bigl({\|z-x\|}_{z}^{2}-{\|z-x\|}_{x}^{2}\bigr)}+{\bigl(\varphi(z)-\varphi(x)\bigr)}\,.$ Invoking ASC of $\phi$, we can take $r_{3}(\varepsilon)$ so that $\mathbb{P}_{z}{\bigl({\|z-x\|}_{z}^{2}-{\|z-x\|}_{x}^{2}\leq2\varepsilon r^{2}/d\bigr)}\geq1-\varepsilon$ for any $r\leq r_{3}(\varepsilon)$ and control the first term. Let the complement of this event be our second bad event $B_{2}$. For $\varphi(z)-\varphi(x)$, Taylor's expansion of $\varphi$ at $x$ leads to $\varphi(z)-\varphi(x)=\underbrace{\langle\nabla\varphi(x),z-x\rangle}_{\eqqcolon\textsf{A'}}+\underbrace{\frac{1}{2}\langle\nabla^{2}\varphi(x^{*})(z-x),z-x)\rangle}_{\eqqcolon\textsf{B'}}\text{ for some }x^{*}\in[x,z]\,.$ As for $\textsf{A}'$, we have $\langle\nabla\varphi(x),z-x\rangle=\frac{r}{\sqrt{d}}\langle g(x)^{-1/2}\nabla\varphi(x),h\rangle$, and a standard tail bound for $h$ leads to $\mathbb{P}_{z}{\Bigl(\langle\nabla\varphi(x),z-x\rangle\leq-\frac{r}{\sqrt{d}}\,{\|g(x)^{-1/2}\nabla\varphi(x)\|}_{2}\cdot2\log\frac{1}{\varepsilon}\Bigr)}\leq\varepsilon\,.$ We call this event $B_{3}$ and bound ${\|g(x)^{-1/2}\nabla\varphi(x)\|}_{2}$ via SSC of $g$ as follows: omitting $x$ for simplicity, {\|g^{-\frac{1}{2}}\nabla\varphi\|}_{2}=\sup_{v:{\|v\|}_{2}=1}\langle\nabla\varphi,g^{-\frac{1}{2}}v\rangle\underset{\text{(i)}}{=}\sup_{v}\textup{Tr}(g^{-1}\mathrm{D} g[g^{-\frac{1}{2}}v])=\sup_{v}\textup{Tr}(g^{-\frac{1}{2}}\mathrm{D} g[g^{-\frac{1}{2}}v]\,g^{-\frac{1}{2}})\underset{\text{(ii)}}{\leq}\sup_{v}\sqrt{d}\,{\|g^{-\frac{1}{2}}\mathrm{D} g[g^{-\frac{1}{2}}v]\,g^{-\frac{1}{2}}\|}_{F}\underset{\text{(iii)}}{\leq}\sup_{v}2\sqrt{d}{\|g^{-\frac{1}{2}}v\|}_{x}=2\sqrt{d}\,, where (i) follows from \ref{['eq:gradLogDet']}, (ii) is due to $\textup{Tr}(A)\leq\sqrt{d}{\|A\|}_{F}$ for $A\in\mathbb{R}^{d\times d}$, and (iii) is due to SSC. Conditioned on $B_{3}^{c}$, taking $r\leq r_{4}(\varepsilon):=\varepsilon(4\log\frac{1}{\varepsilon})^{-1}$, we have $\mathsf{A'}=\langle\nabla\varphi(x),z-x\rangle\geq-4r\,\log\frac{1}{\varepsilon}\geq-\varepsilon\,.$ As for $\textsf{B}'$, denoting $u=z-x$ for $z\in B_{0}^{c}$ \mathrm{D}^{2}\varphi(x^{*})[u,u]\underset{\ref{['eq:hessLogDet']}}{=}\textup{Tr}{\bigl(g(x^{*})^{-1}\mathrm{D}^{2}g(x^{*})[u,u]\bigr)}-{\|g(x^{*})^{-\frac{1}{2}}\mathrm{D} g(x^{*})[u]\,g(x^{*})^{-\frac{1}{2}}\|}_{F}^{2}\underset{\text{(i)}}{\geq}-{\|u\|}_{x^{*}}^{2}-{\|g(x^{*})^{-\frac{1}{2}}\mathrm{D} g(x^{*})[u]\,g(x^{*})^{-\frac{1}{2}}\|}_{F}^{2}\ge-{\|u\|}_{x^{*}}^{2}-4{\|u\|}_{x^{*}}^{2}\underset{\text{(ii)}}{\geq}-5(1-{\|x-x^{*}\|}_{x})^{-2}{\|u\|}_{x}^{2}\geq-5(1+2cr)^{2}c^{2}r^{2}\,, where (i) follows from LTSC and (ii) follows from Lemma \ref{['lem:scCloseness']}. Hence, $\mathsf{B'}\geq-\varepsilon/2$ by taking $r\leq r_{5}(\varepsilon)$ so that $5(1+2cr_{5})^{2}c^{2}r_{5}^{2}=\varepsilon$. In summary, conditioned on $G:=\bigcap_{i=0}^{3}B_{i}^{c}$ with $\mathbb{P}_{z}(G)\geq\frac{1}{2}-4\varepsilon$ due to the union bound, we have \textsf{A}:\,\frac{\exp f(x)}{\exp f(z)}\geq\exp(-\varepsilon)\,,\textsf{B}:\,\frac{p_{z}(x)}{p_{x}(z)}\geq\exp(-3\varepsilon)\,,\,\varphi(z)-\varphi(x)\geq-2\varepsilon\,. Combining these together, r_{x}=1-\int\min{\Bigl(1,\mathbf{1}_{K}(z)\frac{\exp f(x)}{\exp f(z)}\,\frac{p_{z}(x)}{p_{x}(z)}\Bigr)}p_{x}(\mathrm{d} z)\leq1-\int_{G}(1\wedge e^{-\varepsilon}e^{-3\varepsilon})\,\mathbb{P}_{z}(G)\leq\frac{1}{2}+5\varepsilon\,. Bounding $r_{y}$ in the same way, we conclude that $\textsf{I}\leq\frac{1}{2}+5\varepsilon$ in \ref{['eq:tv-formula']}. WLOG, assume $f(y)\geq f(x)$. We denote good events by $G_{x}=\cap_{i=0,2,3}B_{x,i}^{c}$ and $G_{y}=\cap_{i=0,2,3}B_{y,i}^{c}$ such that $\mathbb{P}_{p_{x}}(G_{x}^{c})\leq3\varepsilon$ and $\mathbb{P}_{p_{y}}(G_{y}^{c})\leq3\varepsilon$, where B_{x,0}=\{{\|z-x\|}_{x}\geq cr\}\ \text{with }c\geq1+\frac{2}{\sqrt{d}}\,\log\frac{1}{\varepsilon},\ \text{and}\ B_{x,2}=\Bigl\{{\|z-x\|}_{z}^{2}-{\|z-x\|}_{x}^{2}>\frac{2\varepsilon r^{2}}{d}\Bigr\}B_{x,3}=\Bigl\{\nabla\varphi(x)^{\mathsf{T}}(z-x)\leq-\frac{2r\log\frac{1}{\varepsilon}}{\sqrt{d}}\,{\|g(x)^{-\frac{1}{2}}\nabla\varphi(x)\|}_{2}\Bigr\}\,. Let $G:=G_{x}\cup G_{y}$, and define a partition of $G$ by $G_{x\backslash y}:=G_{x}\backslash G_{y},\qquad G_{x,y}:=G_{x}\cap G_{y},\qquad G_{y\backslash x}:=G_{y}\backslash G_{x}\,.$ Now we decompose the term $\textsf{II}$ as follows: for $Q:=|A(x,z)p_{x}(z)-A(y,z)p_{y}(z)|$, \textsf{II}=\frac{1}{2}\int_{K\backslash G}Q\,\mathrm{d} z+\underbrace{\frac{1}{2}\int_{G_{x\backslash y}}Q\,\mathrm{d} z}_{\eqqcolon\mathcal{A}}+\underbrace{\frac{1}{2}\int_{G_{y\backslash x}}Q\,\mathrm{d} z}_{\eqqcolon\mathcal{B}}+\underbrace{\frac{1}{2}\int_{G_{x,y}}Q\,\mathrm{d} z}_{\eqqcolon\mathcal{C}}\leq\frac{1}{2}{\bigl(\mathbb{P}_{p_{x}}(K\backslash G)+\mathbb{P}_{p_{y}}(K\backslash G)\bigr)}+\mathcal{A}+\mathcal{B}+\mathcal{C}\leq\frac{1}{2}{\bigl(\mathbb{P}_{p_{x}}(G_{x}^{c})+\mathbb{P}_{p_{y}}(G_{y}^{c})\bigr)}+\mathcal{A}+\mathcal{B}+\mathcal{C}\leq3\varepsilon+\mathcal{A}+\mathcal{B}+\mathcal{C}\,. The term $\mathcal{A}$ can be further decomposed by 2\mathcal{A}\leq\int_{G_{x\backslash y}}A(x,z)\,|p_{x}(z)-p_{y}(z)|\,\mathrm{d} z+\int_{G_{x\backslash y}}|A(x,z)-A(y,z)|\,p_{y}(\mathrm{d} z)\leq\int_{G_{x\backslash y}}|p_{x}(z)-p_{y}(z)|\,\mathrm{d} z+\mathbb{P}_{p_{y}}(G_{x\backslash y})\leq\int_{G_{x\backslash y}}|p_{x}(z)-p_{y}(z)|\,\mathrm{d} z+\underbrace{\mathbb{P}_{p_{y}}(G_{y}^{c})}_{\leq3\varepsilon}\,, and in a similar way $\mathcal{B}\leq\frac{1}{2}\int_{G_{y\backslash x}}|p_{x}(z)-p_{y}(z)|\,\mathrm{d} z+3\varepsilon/2$. Combining these together, \mathcal{A}+\mathcal{B}\leq3\varepsilon+\frac{1}{2}\int_{G_{x\backslash y}\cup G_{y\backslash x}}|p_{x}(z)-p_{y}(z)|\,\mathrm{d} z\leq3\varepsilon+d_{\textrm{TV}}(p_{x},p_{y})\leq4\varepsilon\,, where we used $d_{\textrm{TV}}(p_{x},p_{y})\leq\varepsilon$; to see this, recall Pinsker's inequality and a formula for the $\mathsf{KL}$ divergences between two Gaussians: $2[d_{\textrm{TV}}(p_{x},p_{y})]^{2}\leq\mathsf{KL}(p_{y}\mathbin{\|} p_{x})=\frac{1}{2}\,{\Bigl(\textup{Tr}{\bigl(g(y)^{-1}g(x)\bigr)}-d+\log\det{\bigl(g(y)g(x)^{-1}\bigr)}+\frac{d}{r^{2}}\,{\|y-x\|}_{x}^{2}\Bigr)}\,.$ Let $\{\lambda_{i}\}_{i\in[d]}$ be the eigenvalues of $g(x)^{-\frac{1}{2}}g(y)g(x)^{-\frac{1}{2}}$ and ${\|x-y\|}_{x}\leq\frac{sr}{\sqrt{d}}$ with $s>0$ to be determined. Then, $\frac{1}{2}\leq\lambda_{i}\leq1+8{\|x-y\|}_{x}$ by Lemma \ref{['lem:scCloseness']}. Using this and $\log x\leq x-1$ for $x>0$, 2\,\mathsf{KL}(p_{y}\mathbin{\|} p_{x})=\sum_{i=1}^{d}{\Bigl(\lambda_{i}-1+\log\frac{1}{\lambda_{i}}\Bigr)}+\frac{d}{r^{2}}\,{\|y-x\|}_{x}^{2}\le\sum_{i=1}^{d}\frac{(\lambda_{i}-1)^{2}}{\lambda_{i}}+s^{2}\leq s^{2}\,(128r^{2}+1)\,, Taking $s\leq s_{1}(\varepsilon):=\varepsilon$ and $r\leq r_{6}(\varepsilon)$ so that $\sqrt{128r_{6}^{2}+1}\leq2$, we obtain d_{\textrm{TV}}(p_{x},p_{y})\leq\sqrt{\frac{1}{2}\,\mathsf{KL}(p_{y}\mathbin{\|} p_{x})}\leq\frac{s}{2}\sqrt{128r^{2}+1}\leq\varepsilon\,, We now bound $\mathcal{C}$. Recall $B_{x,1}=\{\langle\nabla f(x),z-x\rangle\ge0\}$ and $\mathbb{P}_{p_{x}}(B_{x,1})\leq\frac{1}{2}+\mathcal{O}(\varepsilon)$. Then, 2\mathcal{C}=\int_{(G_{x}\cap G_{y})\backslash B_{x,1}^{c}}Q\,\mathrm{d} z+\int_{G_{x}\cap G_{y}\cap B_{x,1}^{c}}Q\,\mathrm{d} z\leq\int_{B_{x,1}}\underbrace{Q}_{\text{The traingle inequality}}\,\mathrm{d} z+\int_{G_{x}\cap G_{y}\cap B_{x,1}^{c}}Q\,\mathrm{d} z\leq\int_{B_{x,1}}|A(x,z)-A(y,z)|\,p_{x}(\mathrm{d} z)+\int_{B_{x,1}}A(y,z)\:|p_{x}(z)-p_{y}(z)|\,\mathrm{d} z+\int_{G_{x}\cap G_{y}\cap B_{x,1}^{c}}Q\,\mathrm{d} z\le\underbrace{\mathbb{P}_{p_{z}}(B_{x,1})}_{\leq\frac{1}{2}+\varepsilon}+2\underbrace{d_{\textrm{TV}}(p_{x},p_{y})}_{\leq\varepsilon\ (\ref{['eq:TV-by-KL']})}+\int_{G_{x}\cap G_{y}\cap B_{x,1}^{c}}|A(x,z)\,p_{x}(z)-A(y,z)\,p_{y}(z)|\,\mathrm{d} z\leq\frac{1}{2}+2\varepsilon+\int_{G_{x}\cap G_{y}\cap B_{x,1}^{c}}|A(x,z)\,p_{x}(z)-A(y,z)\,p_{y}(z)|\,\mathrm{d} z\,. One can check that $|A(x,z)\,p_{x}(z)-A(y,z)\,p_{y}(z)|\,\mathrm{d} z=|\min{\Bigl(1,\underbrace{\frac{\exp f(x)}{\exp f(z)}\,\frac{p_{z}(x)}{p_{x}(z)}}_{\eqqcolon\mathsf{U}}\Bigr)}-\min{\Bigl(\underbrace{\frac{p_{y}(z)}{p_{x}(z)}}_{\eqqcolon\mathsf{V}},\underbrace{\frac{\exp f(y)}{\exp f(z)}\,\frac{p_{z}(y)}{p_{x}(z)}}_{\eqqcolon\mathsf{W}}\Bigr)}|\,p_{x}(\mathrm{d} z)\,.$ Here we note that $\mathsf{U}\geq e^{-4\varepsilon}$ due to $\frac{\exp f(x)}{\exp f(z)}\geq e^{-\varepsilon}$ and $\frac{p_{z}(x)}{p_{x}(z)}\geq e^{-3\varepsilon}$ from \ref{['eq:fx-ratio']} and \ref{['eq:prop-ratio']}. We now show that under additional conditioning, $|\log\textsf{V}|\lesssim\varepsilon$ and $\log\textsf{W}\gtrsim-\varepsilon$ on $z\in G_{x}\cap G_{y}\cap B_{x,1}^{c}$. For $\varphi(\cdot)=\frac{1}{2}\log\det g(\cdot)$ and $\mathsf{L}:=-\frac{d}{2r^{2}}({\|z-y\|}_{y}^{2}-{\|z-x\|}_{x}^{2})$, \log\mathsf{V}=-\frac{d}{2r^{2}}({\|z-y\|}_{y}^{2}-{\|z-x\|}_{x}^{2})+\varphi(y)-\varphi(x)=\textsf{L}+\langle\nabla\varphi(x),y-x\rangle+\underbrace{\frac{1}{2}\langle\nabla^{2}\varphi(x^{*})(y-x),y-x\rangle}_{\text{Use }\ref{['eq:so-taylor-logdet']}}\quad\text{for some }x^{*}\in[x,y]\geq\textsf{L}-{\|g(x)^{-1/2}\nabla\varphi(x)\|}_{2}{\|y-x\|}_{x}-5\underbrace{(1+2{\|x-y\|}_{x})^{2}}_{\leq2}{\|y-x\|}_{x}^{2}\geq\textsf{L}-2\sqrt{d}\cdot s\frac{r}{\sqrt{d}}-10s^{2}\frac{r^{2}}{d}\geq\textsf{L}-\varepsilon\,, where the inequality follows from $s\leq\frac{\varepsilon}{10}$ and $r\leq r_{7}(\varepsilon):=1$. As for $\textsf{W}$, due to $f(y)\geq f(x)$ and $\exp(f(x)-f(z))\geq\exp(-\varepsilon)$, \log\mathsf{W}\geq\log{\Bigl(\frac{\exp f(x)}{\exp f(z)}\frac{p_{z}(y)}{p_{x}(z)}\Bigr)}\geq-\varepsilon-\frac{d}{2r^{2}}({\|z-y\|}_{z}^{2}-{\|z-x\|}_{x}^{2})+\varphi(z)-\varphi(x)\underset{\text{(i)}}{\geq}-\varepsilon-\frac{d}{2r^{2}}{\Bigl({\|z-y\|}_{y}^{2}+2\varepsilon\frac{r^{2}}{d}-{\|z-x\|}_{x}^{2}\Bigr)}-2\varepsilon=\textsf{L}-4\varepsilon\,, where (i) follows from ${\|z-y\|}_{z}^{2}-{\|z-y\|}_{y}^{2}\leq2\varepsilon r^{2}/d$ on $z\in B_{y,2}^{c}$, and $\varphi(z)-\varphi(x)\geq-2\varepsilon$ on $z\in B_{x,3}^{c}$ from \ref{['eq:vphi-z-x']}. Lastly, we show that $|\textsf{L}|$ is bounded by $\mathcal{O}(\varepsilon)$ with high probability (w.r.t. $p_{x}$). Due to affine invariance of the algorithm, we may assume that $x=0$ and $g(x)=I_{d}$ (so $p_{x}=\mathcal{N}(0,I_{d})$). Therefore, {\|z-y\|}_{y}^{2}-{\|z-x\|}_{x}^{2}={\|z-y\|}_{y}^{2}-{\|z\|}^{2}={\|z\|}_{g(y)-I_{d}}^{2}-2\langle z,y\rangle_{y}+{\|y\|}_{y}^{2}\,. The last term is bounded by $2{\|y\|}^{2}$ due to SC of $g$. Using a tail bound for Gaussians, we have $\mathbb{P}_{p_{x}}{\bigl(|\langle z,y\rangle_{y}|\geq\frac{r}{\sqrt{d}}\,{\|g(y)y\|}_{2}\cdot2\log\frac{1}{\varepsilon}\bigr)}\leq\varepsilon$ and call this event $C_{1}$. In addition, SC of $g$ leads to $g(y)\preceq2I_{d}$, so ${\|g(y)y\|}\leq2{\|y\|}$. To bound ${\|z\|}_{g(y)-I_{d}}^{2}$, we note that ${\|y\|}={\|y-x\|}_{x}\leq1/\sqrt{2}$ and so {\|g(y)-I_{d}\|}_{F}\leq(1+2{\|y\|})^{2}{\|y\|}\leq2s\frac{r}{\sqrt{d}}\,,\quad\text{(Lemma }\ref{['lem:strongSC-closeness']}\text{)}\mathbb{E}[{\|z\|}_{g(y)-I_{d}}^{2}]=\frac{r^{2}}{d}\textup{Tr}(g(y)-I_{d})\leq\frac{r^{2}}{d}\sqrt{d}\,{\|g(y)-I_{d}\|}_{F}\leq\frac{r^{2}}{d}\cdot2rs\,. By the Hanson-Wright inequality, for universal constants $K_{1},K_{2}>0$ and $t\geq0$ it holds that $\mathbb{P}_{z\sim\mathcal{N}(0,I_{d})}{\bigl(|{\|z\|}_{g(y)-I_{d}}^{2}-\mathbb{E}[{\|z\|}_{g(y)-I_{d}}^{2}]|\geq t\bigr)}\leq2\exp{\Bigl(-K_{1}{\Bigl(\frac{t^{2}}{K_{2}^{4}\frac{r^{4}}{d^{2}}{\|g(y)-I_{d}\|}_{F}^{2}}\wedge\frac{t}{K_{2}^{2}\frac{r^{2}}{d}{\|g(y)\|}_{2}}\Bigr)}\Bigr)}\,.$ By taking $r\leq r_{8}(\varepsilon):=\frac{\sqrt{K_{1}}}{2K_{2}^{2}}$ and $s\leq s_{2}(\varepsilon):=\varepsilon(1+\sqrt{\log\frac{2}{\varepsilon}})^{-1}$, it follows that ${\|z\|}_{g(y)-I_{d}}^{2}\leq\frac{2\varepsilon r^{2}}{d}$ with probability at least $1-\varepsilon$. Denote the complement of this event by $C_{2}$. Conditioned on $z\in C_{1}^{c}\cap C_{2}^{c}$, we conclude that |{\|z-y\|}_{y}^{2}-{\|z-x\|}_{x}^{2}|\leq{\|z\|}_{g(y)-I_{d}}^{2}+2|\langle z,y\rangle_{y}|+2{\|y\|}^{2}\leq\frac{2r^{2}\varepsilon}{d}+\frac{8r{\|y\|}}{\sqrt{d}}\,\log\frac{1}{\varepsilon}+2{\|y\|}^{2}\leq\frac{2r^{2}}{d}\cdot3\varepsilon\,, where the last inequality follows from ${\|y\|}\leq\frac{sr}{\sqrt{d}}$ when $s\leq s_{3}(\varepsilon):=\varepsilon\,(4\log\frac{1}{\varepsilon})^{-1}$. Hence, $|\textsf{L}|\leq3\varepsilon$ on $C_{1}^{c}\cap C_{2}^{c}$. Putting this into \ref{['eq:logV-lower']} and \ref{['eq:logW-lower']}, $\log\mathsf{V}\geq\exp(-4\varepsilon)\qquad\text{and}\qquad\log\mathsf{W}\geq\exp(-7\varepsilon)\,.$ We can also show $\log\textsf{V}\leq5\varepsilon$. Conditioned on $z\in C_{1}^{c}\cap C_{2}^{c}$, $-\log\mathsf{V}=-\textsf{L}+\varphi(x)-\varphi(y)\geq-3\varepsilon+\varphi(x)-\varphi(y)\geq-5\varepsilon\,,$ since $\varphi(x)-\varphi(y)$ can be lowered bounded by $-2\varepsilon$ as in \ref{['eq:bound-vphi']}. Hence, $\log\textsf{V}\leq5\varepsilon$. For $F:=G_{x}\cap G_{y}\cap B_{x,1}^{c}$ and $C:=(C_{1}\cup C_{2})^{c}$, since $e^{-4\varepsilon}\leq\mathsf{V}\leq e^{5\varepsilon}$, $e^{-7\varepsilon}\leq\mathsf{W}$, and $\mathsf{U}\geq e^{-4\varepsilon}$, \int_{F}|A(x,z)\,p_{x}(z)-A(y,z)\,p_{y}(z)|\,\mathrm{d} z\leq\int_{C^{c}}(\cdot)\,\mathrm{d} z+\int_{F\cap C}(\cdot)\,\mathrm{d} z\leq\underbrace{\mathbb{P}_{p_{x}}(C^{c})}_{\leq2\varepsilon}+2\underbrace{d_{\textrm{TV}}(p_{x},p_{y})}_{\leq\varepsilon}+\int_{F\cap C}(\cdot)\,\mathrm{d} z\leq4\varepsilon+\int_{F\cap C}|1\wedge\mathsf{U}-\mathsf{V}\wedge\mathsf{W}|\,p_{x}(\mathrm{d} z)\leq4\varepsilon+(e^{5\varepsilon}-e^{-4\varepsilon})\leq18\varepsilon\,. Using this, we can bound $\mathcal{C}$ by \mathcal{C}\leq\frac{1}{4}+\varepsilon+\frac{1}{2}\int_{F}|A(x,z)\,p_{x}(z)-A(y,z)\,p_{y}(z)|\,\mathrm{d} z\leq\frac{1}{4}+10\varepsilon\,. Therefore, $\textsf{II}\leq3\varepsilon+\mathcal{A}+\mathcal{B}+\mathcal{C}\leq3\varepsilon+4\varepsilon+\frac{1}{4}+10\varepsilon\leq\frac{1}{4}+17\varepsilon.$ Along with $\textsf{I}\leq\frac{1}{2}+5\varepsilon$, we can conclude that if $r\leq\min_{i}r_{i}(\varepsilon)$ and $s\leq\min_{i}s_{i}(\varepsilon)$, then $d_{\textrm{TV}}(P_{x},P_{y})\leq\frac{3}{4}+23\varepsilon$. We now prove an isoperimetric inequality arising from the a SC barrier. Recall the cross-ratio distance $d_{K}$ defined on a convex body $K$: for $x,y\in\textup{int}(K)$, suppose that the chord passing through $x,y$ has endpoints $p$ and $q$ in the boundary $\partial K$ (so the order of points is $p,x,y,q$), then the cross-ratio distance between $x$ and $y$ is defined by $d_{K}(x,y)\stackrel{\mathrm{{ def}}}{=}\frac{{\|x-y\|}_{2}{\|p-q\|}_{2}}{{\|p-x\|}_{2}{\|y-q\|}_{2}}\,.$ The first type of isoperimetric inequalities says $\psi_{\pi}\gtrsim1/\sqrt{\bar{\nu}}$. For a ball $B_{r}(0)$ of radius $r>0$ centered at the origin, we define a convex body $K_{r}:=K\cap B_{r}(0)$ and use $\pi_{r}$ to denote the truncated distribution of $\pi$ over $K_{r}$. Let $\{S_{1},S_{2},S_{3}\}$ be a partition of $K$ and define $S_{i}^{r}:=S_{i}\cap K_{r}$ for $i\in[3]$. By lovasz2007geometry, we have $\pi_{r}(S_{3}^{r})\geq d_{K_{r}}(S_{1}^{r},S_{2}^{r})\,\pi_{r}(S_{1}^{r})\,\pi_{r}(S_{2}^{r})\,,$ where $d_{K_{r}}(S_{1}^{r},S_{2}^{r})=\inf_{x\in S_{1}^{r},y\in S_{2}^{r}}d_{K_{r}}(x,y)$. Due to $d_{K_{r}}(x,y)\geq{\|x-y\|}_{x}/\sqrt{\bar{\nu}}$ for any $x,y\in K_{r}$ (see laddha2020strong), $\pi_{r}(S_{3}^{r})\geq\inf_{x\in S_{1}^{r},\,y\in S_{2}^{r}}\frac{{\|x-y\|}_{x}}{\sqrt{\bar{\nu}}}\,\pi_{r}(S_{1}^{r})\,\pi_{r}(S_{2}^{r})\geq\frac{1}{\sqrt{\bar{\nu}}}\inf_{x\in S_{1},\,y\in S_{2}}{\|x-y\|}_{x}\,\pi_{r}(S_{1}^{r})\,\pi_{r}(S_{2}^{r})\,.$ As $r\to\infty$, the bounded convergence theorem implies $\pi_{r}(S_{i}^{r})\to\pi(S_{i})$ for $i\in[3]$, completing the proof. We provide the deferred proof for another isoperimetric inequality, $\psi_{\pi}\gtrsim\sqrt{\alpha}$, originating from $\alpha$-relatively strong-convexity of the potential with respect to $\nabla^{2}\phi$. The proof essentially follows gopi2023algorithmic. Their first proof ingredient is a modified localization lemma gopi2023algorithmic; let $f_{1},f_{2},f_{3},f_{4}$ be non-negative functions on $\mathbb{R}^{d}$ such that $f_{1}$ and $f_{2}$ are upper semicontinuous, and $f_{3}$ and $f_{4}$ are lower semicontinuous, and $\phi:\mathbb{R}^{d}\to\mathbb{R}$ be convex. Then the following are equivalent: For any density $\pi:\mathbb{R}^{d}\to\mathbb{R}$ which is $1$-relatively strongly logconcave in $\phi$, $\int f_{1}\,\mathrm{d}\pi\cdot\int f_{2}\,\mathrm{d}\pi\leq\int f_{3}\,\mathrm{d}\pi\cdot\int f_{4}\,\mathrm{d}\pi\,.$Let $\int_{E}h:=\int_{0}^{1}h((1-t)\,a+tb)e^{-\gamma t}\,\mathrm{d} t$. Then $\int_{E}f_{1}e^{-\phi}\cdot\int_{E}f_{2}e^{-\phi}\leq\int_{E}f_{3}e^{-\phi}\cdot\int_{E}f_{4}e^{-\phi}$ for any $a,b\in\mathbb{R}^{d}$ and $\gamma\in\mathbb{R}$. First of all, this can be generalized to an extended convex function $f$ and $\phi$, whose values outside of $\textup{int}(K)$ are set to $\infty$. Since the density $\pi$ and a needle $\exp\left(\gamma t-\phi((1-t)a+tb)\right)$ for $\gamma\in\mathbb{R}$ and $a,b\in\mathbb{R}^{d}$ (induced by the extended $f$ and $\phi$) vanish outside of $\textup{int}(K)$, integrands above become zero on $\textup{int}(K)^{c}$, and thus the integrals above remain the same. As in gopi2023algorithmic, the proof boils down to the case of $\alpha=1$, and it suffices to show that there exists a constant $C>0$ such that $C\cdot d_{\phi}(S_{1},S_{2})\int_{S_{1}}e^{-f}\cdot\int_{S_{2}}e^{-f}\leq\int e^{-f}\int_{S_{3}}e^{-f}\,.$ We can replace $S_{i}\gets$ its closure $\bar{S_{i}}$ for $i\in[2]$, which only increases the LHS. Also, we can replace $S_{3}\gets$ an open set $\textup{int}(K)\backslash\bar{S_{1}}\backslash\bar{S_{2}}$, which does not change the RHS since the boundary of a convex set is a null set lang1986note. By taking $f_{i}=\mathbf{1}_{S_{i}}$ for $i\in[3]$ and $f_{4}=(C\,d_{\phi}(S_{1},S_{2}))^{-1}$, we only need to show that for some $0\leq c<d\leq1$, C\cdot d_{\phi}(S_{1},S_{2})\int_{c}^{d}e^{\gamma t-\phi((1-t)\,a+tb)}\mathbf{1}_{S_{1}}((1-t)\,a+tb)\,\mathrm{d} t\cdot\int_{c}^{d}e^{\gamma t-\phi((1-t)\,a+tb)}\mathbf{1}_{S_{2}}((1-t)\,a+tb)\,\mathrm{d} t\leq\int_{c}^{d}e^{\gamma t-\phi((1-t)\,a+tb)}\,\mathrm{d} t\cdot\int_{c}^{d}e^{\gamma t-\phi((1-t)\,a+tb)}\mathbf{1}_{S_{3}}((1-t)\,a+tb)\,\mathrm{d} t\,, where $\phi((1-t)\,a+b)<\infty$ for $t\in(c,d)$. The rest of the proof is similar to gopi2023algorithmic. Let $p:\mathbb{R}^{d}\to\mathbb{R}$ be a log-concave density with finite second moment. Then $p$ is bounded on $\mathbb{R}^{d}$. Let $X\sim p$ and denote the mean and covariance of the distribution $p$ by $\mu:=\mathbb{E}[X]$ and $\Sigma:=\mathbb{E}[(X-\mu)(X-\mu)^{\mathsf{T}}]$. Then the pushforward $T_{\#}p$ of $p$ via the map $T:x\mapsto y:=\Sigma^{-1/2}(x-\mu)$ is an isotropic log-concave, and satisfy $(T_{\#}p)(y)=\frac{p(x)}{|\det T|}$. Since $T_{\#}p$ is bounded on $\mathbb{R}^{d}$ lovasz2007geometry, $p$ is bounded as well. Next, we show that every measure appearing within the sampling IPM is integrable. Recall that we may assume $\phi\geq0$. Hence, all $\mu_{i}$'s in Phase 3 and 4 are well-defined $\int_{K}\exp{\Bigl(-{\bigl(f(x)+\frac{\phi(x)}{\sigma_{i}^{2}}\bigr)}\Bigr)}\,\mathrm{d} x\leq\int_{K}\exp(-f(x))\,\mathrm{d} x<\infty\,.$ In particular, $\exp{\bigl(-(f+\frac{\phi}{\nu/d})\bigr)}$ is integrable with finite second moment. By Proposition \ref{['prop:density-bounded']}, $f(x)+\frac{\phi(x)}{\nu/d}$ achieves a global minimum $m$ in $K$. As $\sigma_{i}^{2}\leq\sigma_{i_{0}}^{2}=\nu/d$ in Phase 2, we have \int_{K}\exp{\Bigl(-\frac{\sigma_{i_{0}}^{2}f+\phi}{\sigma_{i_{0}}^{2}}\Bigr)}=\int_{K}\exp{\Bigl(-\frac{\sigma_{i_{0}}^{2}f+\phi-\min(\sigma_{i_{0}}^{2}f+\phi)}{\sigma_{i_{0}}^{2}}-\frac{\min(\sigma_{i_{0}}^{2}f+\phi)}{\sigma_{i_{0}}^{2}}\Bigr)}\geq\int_{K}\exp{\Bigl(-\frac{\bar{f}+\phi-\sigma_{i_{0}}^{2}m}{\sigma_{i}^{2}}-m\Bigr)}=\exp{\Bigl(m{\bigl(\frac{\sigma_{i_{0}}^{2}}{\sigma_{i}^{2}}-1\bigr)}\Bigr)}\int_{K}\exp{\Bigl(-\frac{\bar{f}+\phi}{\sigma_{i}^{2}}\Bigr)}\,, where the inequality holds due to $\min(\sigma_{i_{0}}^{2}f+\phi)=\sigma_{i_{0}}^{2}m$ and $\bar{f}=\sigma_{i_{0}}^{2}f$. Therefore, $\mu_{i}$'s in Phase 2 are also well-defined. We begin with closeness between $\mathcal{N}{\bigl(x^{*},\frac{\sigma_{0}^{2}}{1+\nu\beta d^{-1}}g(x^{*})^{-1}\bigr)}\cdot\mathbf{1}_{\mathcal{D}_{g}^{3\sigma_{0}\sqrt{d}}(x^{*})}$ and $\exp{\bigl(-\frac{\bar{f}+\phi}{\sigma_{0}^{2}}\bigr)}$ in Phase 1. Let $\gamma=9$, $r=(\gamma\sigma_{0}^{2}d)^{1/2}<0.01$, $\psi:=\bar{f}+\phi$, and $S=\{x\in K:\psi(x)\leq\psi(x^{*})+r^{2}/4\}$. For $\widetilde{\mu}_{0}=\exp(-\psi/\sigma_{0}^{2})\cdot\mathbf{1}_{K}\propto\mu_{0}$ and $x\in S$, we have $\mu_{0}(x)\geq e^{-\gamma d}\mu_{0}(x^{*}).$ Due to $\mu_{0}(S^{c})\leq\exp(-\gamma d/3)$ (Lemma \ref{['lem:mostMass-logconcave']}), it follows that $1=\mu_{0}(S)+\mu_{0}(S^{c})\leq\mu_{0}(S)+\exp(-\gamma d/3)$ and $1\leq{\bigl(1+2\exp(-\gamma d/3)\bigr)}\,\mu_{0}(S)={\bigl(1+2\exp(-\gamma d/3)\bigr)}\,\widetilde{\mu}_{0}(S)/\widetilde{\mu}_{0}(\mathbb{R}^{d})\,.$ We show $S\subset D=\mathcal{D}_{g}^{3\sigma_{0}\sqrt{d}}(x^{*})$. For $x\in S$, use Taylor's expansion of $\psi$ at $x^{*}$: for some $\bar{x}\in[x^{*},x]$ \psi(x)-\psi(x^{*})=\frac{1}{2}(x-x^{*})^{\mathsf{T}}\nabla^{2}\psi(\bar{x})(x-x^{*})\geq\frac{1}{2}(x-x^{*})^{\mathsf{T}}\nabla^{2}\phi(\bar{x})(x-x^{*})\,. As $\psi(x)-\psi(x^{*})\leq r^{2}/4$ on $x\in S$, we have ${\|\bar{x}-x^{*}\|}_{\bar{x}}^{2}\leq{\|x-x^{*}\|}_{\bar{x}}^{2}\leq2(\psi(x)-\psi(x^{*}))\leq r^{2}/2$. Thus, by self-concordance of $\phi$ $\exp(-3r)\,{\|x-x^{*}\|}_{x^{*}}^{2}\leq{\|x-x^{*}\|}_{\bar{x}}^{2}\leq\exp(3r)\,{\|x-x^{*}\|}_{x^{*}}^{2}\,,$ and it follows that ${\|x-x^{*}\|}_{x^{*}}^{2}\leq r^{2}$, showing $S\subset D$. Combining \ref{['eq:psi-taylor']}, \ref{['eq:closenss-initial']}, and $(1+\nu\alpha d^{-1})\,\nabla^{2}\phi\preceq\nabla^{2}\psi\preceq(1+\nu\beta d^{-1})\,\nabla^{2}\phi$, we have $\frac{\exp(-3r)}{2}{\Bigl(1+\frac{\nu\alpha}{d}\Bigr)}\,{\|x-x^{*}\|}_{x^{*}}^{2}\underset{(*)}{\leq}\psi(x)-\psi(x^{*})\underset{(\#)}{\leq}\frac{\exp(3r)}{2}{\Bigl(1+\frac{\nu\beta}{d}\Bigr)}\,{\|x-x^{*}\|}_{x^{*}}^{2}\,,$ and thus for a constant $c:=1+\nu\beta d^{-1}$ and function $h(x):=-(2\sigma_{0}^{2})^{-1}{\|x-x^{*}\|}_{x^{*}}^{2}$, {\|\mu/\mu_{0}\|}=\mathbb{E}_{\mu}{\bigl[\frac{\mathrm{d}\mu}{\mathrm{d}\mu_{0}}\bigr]}=\frac{\int_{D}\exp{\Bigl(-\frac{c}{\sigma_{0}^{2}}{\|x-x^{*}\|}_{x^{*}}^{2}+\frac{\psi}{\sigma_{0}^{2}}\Bigr)}\cdot\widetilde{\mu}_{0}(\mathbb{R}^{d})}{{\Bigl[\int_{D}\exp{\Bigl(-\frac{c}{2\sigma_{0}^{2}}\,{\|x-x^{*}\|}_{x^{*}}^{2}\Bigr)}\Bigr]}^{2}}\underset{\text{(}\ref{['eq:intp-intSp']}\text{)}}{\leq}\frac{1}{{\Bigl[\int_{D}\exp(c\cdot h)\Bigr]}^{2}}\int_{D}\exp{\Bigl(-\frac{c}{\sigma_{0}^{2}}{\|x-x^{*}\|}_{x^{*}}^{2}+\underbrace{\frac{\psi}{\sigma_{0}^{2}}}_{\text{Use }(\#)\text{ in (}\ref{['eq:approx-psigap']}\text{)}}\Bigr)}{\bigl(1+2\exp(-\gamma n/3)\bigr)}\underbrace{\widetilde{\mu}_{0}(S)}_{\text{Use }(*)}\lesssim\frac{\int_{D}\exp{\Bigl(-\frac{1}{2\sigma_{0}^{2}}{\bigl(2c-e^{3r}(1+\nu\beta d^{-1})\bigr)}\,{\|x-x^{*}\|}_{x^{*}}^{2}\Bigr)}\int_{D}\exp{\bigl(-\frac{1}{2\sigma_{0}^{2}}e^{-3r}(1+\nu\alpha d^{-1})\,{\|x-x^{*}\|}_{x^{*}}^{2}\bigr)}}{{\Bigl[\int_{D}\exp(c\cdot h)\Bigr]}^{2}}=\underbrace{\frac{\int_{D}\exp{\Bigl({\bigl(2c-c\,e^{3r}\bigr)}\,h(x)\Bigr)}\cdot\int_{D}\exp{\bigl(c\,e^{3r}h(x)\bigr)}}{{\Bigl[\int_{D}\exp(c\cdot h)\Bigr]}^{2}}}_{=:\text{A}}\,\underbrace{\frac{\int_{D}\exp{\bigl(e^{-3r}(1+\nu\alpha d^{-1})\,h(x)\bigr)}}{\int_{D}\exp{\bigl(c\,e^{3r}h(x)\bigr)}}}_{=:\text{B}}\,. As for $\textsf{A}$, Lemma \ref{['lem:adam-logconcave']} leads to \textsf{A}\leq{\Bigl(\frac{c^{2}}{(2c-c\,e^{3r})\,ce^{3r}}\Bigr)}^{d}={\Bigl(\frac{1}{(2-e^{3r})e^{3r}}\Bigr)}^{d}=(1+\mathcal{O}(r^{2}))^{d}=\mathcal{O}(1)\,. As for $\textsf{B}$, let $c_{1}=e^{-3r}\,(1+\nu\alpha d^{-1})$ and $c_{2}=e^{3r}\,(1+\nu\beta d^{-1})$. With the change of variable $y=\sigma_{0}^{-1}\sqrt{c_{i}}g(x^{*})^{1/2}(x-x^{*})$ for $i\in[2]$, it follows that for $r_{i}:=r\sigma_{0}^{-1}\sqrt{c_{i}}(\geq3\sqrt{d})$ \textsf{B}={\Bigl(\frac{c_{2}}{c_{1}}\Bigr)}^{d/2}\frac{\int_{B_{r_{1}}}\exp{\bigl(-\frac{1}{2}{\|y\|}^{2}\bigr)}\,\mathrm{d} y}{\int_{B_{r_{2}}}\exp{\bigl(-\frac{1}{2}{\|y\|}^{2}\bigr)}\,\mathrm{d} y}\leq{\Bigl(\frac{c_{2}}{c_{1}}\Bigr)}^{d/2}\lesssim{\Bigl(\frac{\nu\beta+d}{\nu\alpha+d}\Bigr)}^{d}\,e^{3rd}\lesssim{\Bigl(\frac{\nu\beta+d}{\nu\alpha+d}\Bigr)}^{d}\,.\qedhere Now we show closeness of two consecutive distributions in Phase 2, i.e., $\sigma_{i+1}^{2}=\sigma_{i}^{2}{\bigl(1+\frac{1}{\sqrt{d}}\bigr)}$. Observe that for $\psi=\bar{f}+\phi=\frac{\nu}{d}f+\phi$ on $K$ and $F(\sigma^{2})=\int_{K}\exp(-\psi/\sigma^{2})$, {\|\mu_{i}/\mu_{i+1}\|}=\mathbb{E}_{\mu_{i}}{\bigl[\frac{\mathrm{d}\mu_{i}}{\mathrm{d}\mu_{i+1}}\bigr]}=\frac{\int_{K}\exp{\bigl(-2\frac{\psi}{\sigma_{i}^{2}}+\frac{\psi}{\sigma_{i+1}^{2}}\bigr)}\cdot\int_{K}\exp{\bigl(-\frac{\psi}{\sigma_{i+1}^{2}}\bigr)}}{\left(\int_{K}\exp{\bigl(-\frac{\psi}{\sigma_{i}^{2}}\bigr)}\right)^{2}}=\frac{F{\bigl({\bigl(\frac{2}{\sigma_{i}^{2}}-\frac{1}{\sigma_{i+1}^{2}}\bigr)}^{-1}\bigr)}\,F(\sigma_{i+1}^{2})}{F(\sigma_{i}^{2})^{2}}\,. By Lemma \ref{['lem:adam-logconcave']}, the function $a^{d}F{\bigl(\frac{\sigma^{2}}{a}\bigr)}$ is log-concave in $a$. Using the definition with endpoints $\frac{2}{\sigma_{i}^{2}}-\frac{1}{\sigma_{i+1}^{2}}$ and $\frac{1}{\sigma_{i+1}^{2}}$, and the middle point $\frac{1}{\sigma_{i}^{2}}$, we obtain $\frac{F{\bigl({\bigl(\frac{2}{\sigma_{i}^{2}}-\frac{1}{\sigma_{i+1}^{2}}\bigr)}^{-1}\bigr)}\,F(\sigma_{i+1}^{2})}{F(\sigma_{i}^{2})^{2}}\le\Biggl(\frac{{\bigl(\frac{1}{\sigma_{i}^{2}}\bigr)}^{2}}{{\bigl(\frac{2}{\sigma_{i}^{2}}-\frac{1}{\sigma_{i+1}^{2}}\bigr)}\,\frac{1}{\sigma_{i+1}^{2}}}\Biggr)^{d}=\Biggl(\frac{{\bigl(1+\frac{1}{\sqrt{d}}\bigr)}^{2}}{1+\frac{2}{\sqrt{d}}}\Biggr)^{d}\leq{\Bigl(1+\frac{1}{d}\Bigr)}^{d}\leq e\,.\qedhere$ We now establish closeness in Phase 3, during which we use the update of $\sigma_{i+1}^{2}=\sigma_{i}^{2}{\bigl(1+\frac{\sigma_{i}}{\sqrt{\nu}}\bigr)}$. The update is $\sigma_{i+1}^{2}=\sigma_{i}^{2}\left(1+r\right)$ for $r=\frac{\sigma_{i}}{\sqrt{\nu}}$. For $s:=\frac{r}{1+r}$, $\sigma:=\sigma_{i}$, and $F(\sigma^{2})=\int\exp(-f-\phi/\sigma^{2})$, we have {\|\mu_{i}/\mu_{i+1}\|}=\frac{F{\bigl({\bigl(\frac{2}{\sigma_{i}^{2}}-\frac{1}{\sigma_{i+1}^{2}}\bigr)}^{-1}\bigr)}\,F(\sigma_{i+1}^{2})}{F(\sigma_{i}^{2})^{2}}=\frac{F{\bigl(\frac{\sigma^{2}}{1+s}\bigr)}\,F{\bigl(\frac{\sigma^{2}}{1-s}\bigr)}}{F(\sigma^{2})^{2}}\,. Let $g(t):=\log F{\bigl(\frac{\sigma^{2}}{t}\bigr)}$ for $t>0$. Then, \log{\|\mu_{i}/\mu_{i+1}\|}=g(1+s)+g(1-s)-2g(1)=\int_{0}^{s}{\bigl(g'(1+t)-g'(1-t)\bigr)}\,\mathrm{d} t=\int_{0}^{s}\int_{1-t}^{1+t}g"(q)\,\mathrm{d} q\,\mathrm{d} t and for a probability measure $\nu_{q}\propto\exp{\bigl(-f-\frac{q\phi}{\sigma^{2}}\bigr)}$, g"(q)=\frac{\mathrm{d}^{2}}{\mathrm{d} q^{2}}{\Bigl[\log\int_{K}\exp{\Bigl(-f-\frac{q\phi}{\sigma^{2}}\Bigr)}\Bigr]}=-\frac{1}{\sigma^{2}}\,\frac{\mathrm{d}}{\mathrm{d} q}[\frac{\int_{K}\phi\cdot\exp{\Bigl(-f-\frac{q\phi}{\sigma^{2}}\Bigr)}}{\int_{K}\exp{\Bigl(-f-\frac{q\phi}{\sigma^{2}}\Bigr)}}\Biggr]=-\frac{1}{\sigma^{2}}\,(-\frac{1}{\sigma^{2}}\,\frac{\int_{K}\phi^{2}\cdot\exp{\Bigl(-f-\frac{q\phi}{\sigma^{2}}\Bigr)}}{\int_{K}\exp{\Bigl(-f-\frac{q\phi}{\sigma^{2}}\Bigr)}}+\frac{1}{\sigma^{2}}\,\frac{{\Bigl[\int_{K}\phi\cdot\exp{\Bigl(-f-\frac{q\phi}{\sigma^{2}}\Bigr)}\Bigr]}^{2}}{{\Bigl[\int_{K}\exp{\Bigl(-f-\frac{q\phi}{\sigma^{2}}\Bigr)}\Bigr]}^{2}}\Biggr)=\frac{1}{\sigma^{4}}\,{\Bigl(\mathbb{E}_{\nu_{q}}[\phi^{2}]-(\mathbb{E}_{\nu_{q}}\phi)^{2}\Bigr)}=\frac{1}{\sigma^{4}}\,\mathrm{Var}_{\nu_{q}}\phi\,. By the Brascamp-Lieb inequality with $V(\cdot):=f(\cdot)+\frac{q\phi(\cdot)}{\sigma^{2}}$, \mathrm{Var}_{\nu_{q}}\phi\leq\mathbb{E}_{\nu_{q}}{\bigl[(\nabla\phi)^{\mathsf{T}}{\bigl(\nabla^{2} V\bigr)}^{-1}\nabla\phi\bigr]}\leq\frac{\sigma^{2}}{q}\,\mathbb{E}_{\nu_{q}}{\|\nabla\phi\|}_{(\nabla^{2}\phi)^{-1}}^{2}\leq\frac{\sigma^{2}\nu}{q}\,, and thus $g"(q)\leq\frac{\nu}{q\sigma^{2}}.$ Putting this back to \ref{['eq:L2-bound-phase3']}, we acquire \log{\|\mu_{i}/\mu_{i+1}\|}\leq\frac{\nu}{\sigma^{2}}\int_{0}^{s}\int_{1-t}^{1+t}\frac{1}{q}\,\mathrm{d} q\,\mathrm{d} t=\frac{\nu}{\sigma^{2}}\int_{0}^{s}{\bigl(\log(1+t)-\log(1-t)\bigr)}\,\mathrm{d} t=\frac{\nu}{\sigma^{2}}{\bigl((1+s)\,\log(1+s)+(1-s)\,\log(1-s)\bigr)}\lesssim\frac{\nu s^{2}}{\sigma^{2}}\,. It follows from $s=\frac{r}{1+r}$ and $r=\frac{\sigma}{\sqrt{\nu}}$ that $\mu_{i}$ is an $\mathcal{O}(1)$-warm start for $\mu_{i+1}$. For Phase 4, observe that for $\mu\propto\exp(-f-\phi/\sigma^{2})$ with $\sigma^{2}=\nu$, {\|\mu/\pi\|}=\frac{\int_{K}\exp{\bigl(-f-\frac{\phi}{\sigma^{2}/2}\bigr)}\cdot\int_{K}\exp(-f)}{{\Bigl[\int_{K}\exp{\Bigl(-f-\frac{\phi}{\sigma^{2}}\Bigr)}\Bigr]}^{2}}\underset{\text{(i)}}{=}\lim_{r\to1}\frac{F{\bigl(\frac{\sigma^{2}}{1+r}\bigr)}\cdot F{\bigl(\frac{\sigma^{2}}{1-r}\bigr)}}{F(\sigma^{2})}\underset{\text{(ii)}}{\leq}\lim_{r\to1}\exp{\Bigl(\mathcal{O}(1)\frac{\nu}{\sigma^{2}}\,{\bigl((1+r)\,\log(1+r)+(1-r)\,\log(1-r)\bigr)}\Bigr)}=\exp{\Bigl(\mathcal{O}(1)\frac{\nu}{\sigma^{2}}\Bigr)}=\exp(\mathcal{O}(1))\,. where (i) holds due to the monotone convergence theorem, and (ii) follows from \ref{['eq:bound-ph3']}. Therefore, $\mu$ serves as an $\mathcal{O}(1)$-warm start for $\pi$. The total number of measures involved in Algorithm \ref{['alg:IPM-sampling']} is $m:=\mathcal{O}(\sqrt{d})$. Let $(X_{1},\dots,X_{m})$ be a sequence of samples provided by Algorithm \ref{['alg:IPM-sampling']}, and $(\bar{X}_{1},\dots,\bar{X}_{m})$ be a sequence of samples where each sample is drawn from the actual target distributions $\{\mu_{\sigma^{2}}\}$. Conditioned on events $X_{i}=\bar{X}_{i}$, Algorithm \ref{['alg:IPM-sampling']} ensures that there is a coupling such that $\mathbb{P}(X_{i+1}=\bar{X}_{i+1}\mid X_{i}=\bar{X}_{i})\geq1-\frac{\varepsilon}{\sqrt{d}}$ due to $\varepsilon/\sqrt{d}$ TV-distance guarantee. Combining these couplings, $\mathbb{P}\left(X_{i}=\bar{X_{i}}\ \forall i\in[m]\right)=\mathbb{P}(X_{1}=\bar{X}_{1})\cdot\prod_{i=2}^{m}\mathbb{P}(X_{i}=\bar{X}_{i}\mid X_{i-1}=\bar{X}_{i-1})\geq1-\varepsilon\,.$ Thus, it leads to a coupling between $X_{m}$ and $\bar{X}_{m}$ such that $\mathbb{P}(X_{m}=\bar{X}_{m})\geq1-\varepsilon$, so $\textup{law}(X_{m})$ is within $\varepsilon$-TV distance to $\pi=\textup{law}(\bar{X}_{m})$. We show that $2(g_{1}+g_{2})$ is SSC if $g_{1}$ and $g_{2}$ are SSC. For fixed $x\in K_{1}\cap K_{2}$ and $h\in\mathbb{R}^{d}$, let $\mathrm{D} g_{i}:=\mathrm{D} g_{i}(x)[h]$ for $i=1,2$. Note that {\|(g_{1}+g_{2})^{-\frac{1}{2}}\mathrm{D}(g_{1}+g_{2})\,(g_{1}+g_{2})^{-\frac{1}{2}}\|}_{F}\leq\sum_{i=1}^{2}{\|(g_{1}+g_{2})^{-\frac{1}{2}}\mathrm{D} g_{i}\,(g_{1}+g_{2})^{-\frac{1}{2}}\|}_{F}=\sum_{i=1}^{2}\sqrt{\textup{Tr}{\bigl((g_{1}+g_{2})^{-1}\mathrm{D} g_{i}\,(g_{1}+g_{2})^{-1}\mathrm{D} g_{i}\bigr)}}={\Bigl[\textup{Tr}{\Bigl({\bigl(\underbrace{I+g_{1}^{-\frac{1}{2}}g_{2}g_{1}^{-\frac{1}{2}}}_{=:E_{1}}\bigr)}^{-1}\underbrace{g_{1}^{-\frac{1}{2}}\mathrm{D} g_{1}\,g_{1}^{-\frac{1}{2}}}_{=:T_{1}}{\bigl(I+g_{1}^{-\frac{1}{2}}g_{2}g_{1}^{-\frac{1}{2}}\bigr)}^{-1}g_{1}^{-\frac{1}{2}}\mathrm{D} g_{1}\,g_{1}^{-\frac{1}{2}}\Bigr)}\Bigr]}^{1/2}\qquad+{\Bigl[\textup{Tr}{\Bigl({\bigl(\underbrace{I+g_{2}^{-\frac{1}{2}}g_{1}g_{2}^{-\frac{1}{2}}}_{=:E_{2}}\bigr)}^{-1}\underbrace{g_{2}^{-\frac{1}{2}}\mathrm{D} g_{2}\,g_{2}^{-\frac{1}{2}}}_{=:T_{2}}{\bigl(I+g_{2}^{-\frac{1}{2}}g_{1}g_{2}^{-\frac{1}{2}}\bigr)}^{-1}g_{2}^{-\frac{1}{2}}\mathrm{D} g_{2}\,g_{2}^{-\frac{1}{2}})\Bigr)}\Bigr]}^{1/2}=\sum_{i=1}^{2}\sqrt{\textup{Tr}(E_{i}^{-1}T_{i}E_{i}^{-1}T_{i})}\leq\sum_{i=1}^{2}\sqrt{\textup{Tr}(T_{i}E_{i}^{-2}T_{i})}\,, where we used the Cauchy-Schwarz inequality $\textup{Tr}(A^{2})\leq\textup{Tr}(A^{\mathsf{T}}A)$ in the last line. It follows from $I\preceq E_{i}$ that $I\preceq E_{i}^{2}$ and $I\succeq E_{i}^{-2}\succ0$. Therefore, \sum_{i=1}^{2}\sqrt{\textup{Tr}(T_{i}E_{i}^{-2}T_{i})}\leq\sum_{i=1}^{2}{\|T_{i}\|}_{F}\leq2\sum_{i=1}^{2}{\|h\|}_{g_{i}(x)}^{2}\leq2\sqrt{2}{\|h\|}_{(g_{1}+g_{2})(x)}\,. Putting these together completes the proof. We now show that if $g$ is HSC, then $dg$ is SLTSC. We first consider when $\bar{g}$ is positive definite on $K$. By HSC of $\bar{g}$, it holds that $-{\|h\|}_{\bar{g}}^{2}\,\bar{g}\lesssim\mathrm{D}^{2}\bar{g}[h,h]$, and thus $-\frac{1}{d}\,{\|h\|}_{g}^{2}\,(g'+g)^{-\frac{1}{2}}g\,(g'+g)^{-\frac{1}{2}}\lesssim(g'+g)^{-\frac{1}{2}}\mathrm{D}^{2}g[h,h]\,(g'+g)^{-\frac{1}{2}}\,.$ Hence, \textup{Tr}{\bigl((g'+g)^{-1}\mathrm{D}^{2}g[h,h]\bigr)}\gtrsim-\frac{1}{d}\,{\|h\|}_{g}^{2}\,\textup{Tr}{\Bigl((g'+g)^{-\frac{1}{2}}g\,(g'+g)^{-\frac{1}{2}}\Bigr)}=-\frac{1}{d}\,{\|h\|}_{g}^{2}\,\textup{Tr}{\bigl(g^{\frac{1}{2}}(g'+g)^{-1}g^{\frac{1}{2}}\bigr)}\geq-\frac{1}{d}\,{\|h\|}_{g}^{2}\,\textup{Tr}(g^{\frac{1}{2}}g^{-1}g^{\frac{1}{2}})=-{\|h\|}_{g}^{2}\,. When $g$ is singular, we consider $\bar{g}_{\varepsilon}=\bar{g}+\frac{\varepsilon}{d}I\in\mathbb{S}_{++}^{d}$ for $\varepsilon>0$. Then $\bar{g}_{\varepsilon}$ is HSC, so for $g_{\varepsilon}=d\bar{g}_{\varepsilon}$ $\textup{Tr}{\bigl((g'+g_{\varepsilon})^{-1}\mathrm{D}^{2}g[h,h]\bigr)}\gtrsim-{\|h\|}_{g_{\varepsilon}}^{2}\,.$ From $(g'+g_{\varepsilon})^{-1}=\frac{1}{\det(g'+g_{\varepsilon})}\,\text{adj}(g'+g_{\varepsilon})$, the LHS is continuous in $\varepsilon$, and the RHS is too clearly. Sending $\varepsilon\to0$ completes the proof. To prove Lemma \ref{['lem:hsc-to-sasc']}, we first recall a concentration bound. Let $h$ be drawn from $\mathbb{S}^{d-1}$ uniformly at random. For any odd $k$, $C^{k}$-smooth $F:\mathbb{R}^{d}\to\mathbb{R}$, and $\varepsilon>0$, $\mathbb{P}_{h}{\Bigl(|\mathrm{D}^{k}F(x)[h^{\otimes k}]|>k\varepsilon\cdot\sup_{{\|v\|}\leq1}\mathrm{D}^{k}F(x)[v^{\otimes k}]\Bigr)}\leq\exp{\Bigl(-\frac{d\varepsilon^{2}}{2}\Bigr)}\,.$ We show that if $g$ is HSC, then $dg$ is SASC, using this lemma and following narayanan2016randomized. Let $g=d\,\nabla^{2}\phi$ and consider $g':\textup{int}(K)\to\mathbb{S}_{+}^{d}$ such that $\bar{g}=g+g'$ is PD. For fixed $w\in\mathbb{R}^{d}$, apply Taylor's expansion to $\varphi(z):={\|w\|}_{g(z)}^{2}$ at $z=x$, so there exists $p_{w}\in[x,z]$ such that $w^{\mathsf{T}}g(z)w=w^{\mathsf{T}}g(x)w+\mathrm{D} g(x)[z,w,w]+\frac{1}{2}\,\mathrm{D}^{2}g(p_{w})[z,z,w,w].$ Putting $z=w$ here, $|{\|z\|}_{g(z)}^{2}-{\|z\|}_{g(x)}^{2}|\leq|\mathrm{D}^{3}g(x)[z^{\otimes3}]|+\frac{1}{2}|\mathrm{D}^{2}g(p_{z})[z^{\otimes4}]|\,.$ Going forward, we can assume that $x=0$ and $\bar{g}(x)=I$ due to affine invariance, and then $z$ equals $rh/\sqrt{d}$ for $h\sim\mathcal{N}(0,I_{d})$ in law. Using a standard tail bound on the standard Gaussian, we have $\mathbb{P}_{h}({\|h\|}\geq-\sqrt{d}\cdot2\log\varepsilon)\leq\varepsilon.$ Call this event $B_{1}$. In addition, Lemma \ref{['lem:odd-order-concen']} implies that $\mathbb{P}{\Bigl(|\mathrm{D}^{3}\phi(x){\Bigl[\frac{h^{\otimes3}}{{\|h\|}^{3}}\Bigr]}|\geq3\frac{\varepsilon}{\sqrt{d}}\cdot\sup_{{\|v\|}\leq1}\mathrm{D}^{3}\phi(x)[v^{\otimes3}]\Bigr)}\leq\varepsilon\,,$ and call this event $B_{2}$. Conditioned on $B_{2}^{c}$, |\mathrm{D}^{3}\phi(x){\Bigl[\frac{h^{\otimes3}}{{\|h\|}^{3}}\Bigr]}|\leq\frac{3\varepsilon}{\sqrt{d}}\,\sup_{{\|v\|}\leq1}\mathrm{D}^{3}\phi(x)[v^{\otimes3}]\leq\frac{6\varepsilon}{\sqrt{d}}\,\sup_{{\|v\|}\leq1}{\|v\|}_{g(x)/d}^{3}\leq\frac{6\varepsilon}{d^{2}}\,\sup_{{\|v\|}\leq1}{\|v\|}_{g(x)}^{3}\underbrace{\leq}_{g(x)\preceq I_{d}}\frac{6\varepsilon}{d^{2}}\,. Hence, conditioned on $z\in B_{1}^{c}\cap B_{2}^{c}$ |\mathrm{D}^{3}g(x)[z^{\otimes3}]|=\frac{r^{3}}{\sqrt{d}}\,\mathrm{D}^{3}\phi(x)[h^{\otimes3}]\leq\frac{r^{3}}{\sqrt{d}}\,\frac{6\varepsilon}{d^{2}}\,{\|h\|}^{3}\leq\frac{r^{2}}{d}\cdot48r\varepsilon{\Bigl(\log\frac{1}{\varepsilon}\Bigr)}^{3}\,. By taking $r_{1}(\varepsilon)$ so that $-48r_{1}\varepsilon\,(\log\varepsilon)^{3}\leq\varepsilon$, we can ensure $|\mathrm{D}^{3}g(x)[z^{\otimes3}]|\leq\varepsilon r^{2}/d$ for any $r\leq r_{1}(\varepsilon)$. As for $|\mathrm{D}^{2}g(p_{z})[z^{\otimes4}]|$, HSC of $\phi$ and Lemma \ref{['lem:scCloseness']} lead to \frac{1}{2}\,|\mathrm{D}^{2}g(p_{z})[z^{\otimes4}]|\leq3d\,{\|z\|}_{\nabla^{2}\phi(p_{z})}^{4}\le\frac{3}{d}{\|z\|}_{\nabla^{2}\phi(x)}^{4}\,(1+2\,{\|z\|}_{\nabla^{2}\phi(x)}^{2})^{2}=\frac{3}{d}\,{\|z\|}_{g(x)}^{4}\,{\bigl(1+\frac{2}{d}\,{\|z\|}_{g(x)}^{2}\bigr)}^{2}\underset{g\preceq I_{d}}{\leq}\frac{3}{d}{\|z\|}^{4}{\bigl(1+\frac{2}{d}\,{\|z\|}^{2}\bigr)}^{2}=\frac{3}{d}\,\frac{r^{4}}{d^{2}}\,{\|h\|}^{4}{\Bigl(1+\frac{2r^{2}}{d^{2}}{\|h\|}^{2}\Bigr)}^{2}\leq\frac{r^{2}}{d}\cdot3r^{2}\,{\bigl(2\log\frac{1}{\varepsilon}\bigr)}^{4}{\Bigl(1+2r^{2}{\bigl(2\log\frac{1}{\varepsilon}\bigr)}^{4}\Bigr)}^{2}\,. By taking $r_{2}(\varepsilon)$ and $r_{3}(\varepsilon)$ so that ${\Bigl(1+2r_{2}^{2}{\bigl(2\log\frac{1}{\varepsilon}\bigr)}^{4}\Bigr)}^{2}\leq2$ and $2^{2}\cdot3r_{3}^{2}{\bigl(2\log\frac{1}{\varepsilon}\bigr)}^{4}\leq\varepsilon$ respectively, it holds that on $B_{1}^{c}\cap B_{2}^{c}$ $\frac{1}{2}\,|\mathrm{D}^{2}g(p_{z})[z^{\otimes4}]|\leq\varepsilon\frac{r^{2}}{d}\ \text{for any }r\leq\min r_{i}(\varepsilon).$ Putting all these together, it follows that $|{\|z\|}_{g(z)}^{2}-{\|z\|}_{g(x)}^{2}|\leq2\varepsilon r^{2}/d$ with probability at least $1-2\varepsilon$. By replacing $2\varepsilon\gets\varepsilon$, the claim follows. We start with well-definedness of the notions of collapse and embedding (Definition \ref{['def:sc-along-subspace']}). Let $k:=\dim(W)$, and $U$ and $V$ be matrices in $\mathbb{R}^{d\times k}$, where the columns of each matrix form an orthonormal basis of $W$. Let us denote by $g_{1}:=U^{\mathsf{T}}gU$ and $g_{2}:=V^{\mathsf{T}}gV$ matrices represented with respect to $U$ and $V$, and define the invertible matrix $M=V^{-1}U\in\mathbb{R}^{k\times k}$. Since $U$ and $V$ are full-column rank, if $g_{1}$ is PD, so is $g_{2}$. Suppose $g$ is SSC along $W$. Then, 4{\|h\|}_{g}^{2}\geq\textup{Tr}(g_{1}^{-1}\mathrm{D} g_{1}[h]\,g_{1}^{-1}\mathrm{D} g_{1})=\textup{Tr}{\bigl((U^{\mathsf{T}}gU)^{-1}\cdot U^{\mathsf{T}}\mathrm{D} g[h]\,U\cdot(U^{\mathsf{T}}gU)^{-1}\cdot U^{\mathsf{T}}\mathrm{D} g[h]\,U\bigr)}=\textup{Tr}{\Bigl((M^{\mathsf{T}}V^{\mathsf{T}}gVM)^{-1}\cdot M^{\mathsf{T}}V^{\mathsf{T}}\mathrm{D} g[h]\,VM\cdot(M^{\mathsf{T}}V^{\mathsf{T}}gVM)^{-1}\cdot M^{\mathsf{T}}V^{\mathsf{T}}\mathrm{D} g[h]\,VM\Bigr)}=\textup{Tr}{\Bigl((V^{\mathsf{T}}gV)^{-1}V^{\mathsf{T}}\mathrm{D} g[h]\,V\,(V^{\mathsf{T}}gV)^{-1}V^{\mathsf{T}}\mathrm{D} g[h]\,V\Bigr)}={\|g_{2}^{-\frac{1}{2}}\mathrm{D} g_{2}[h]\,g_{2}^{-\frac{1}{2}}\|}_{F}^{2}\,, and thus $g_{2}$ also satisfies the definition. We begin with a barrier version. For the first part, $\psi$ is a $\nu$-self-concordant barrier for $\bar{K}$ by nesterov2003introductory, so $\mathcal{D}_{\bar{g}}^{1}(x)\subset\bar{K}\cap(2x-\bar{K})$ for $\bar{g}(\cdot):=\nabla^{2}\psi(\cdot)$ by Lemma \ref{['lem:symmetricLeftpart']}. Now let $z\in\bar{K}\cap(2x-\bar{K})$. Then $Tz\in K$ and $T(2x-z)\in K$, and the latter implies $2y-Tz\in K$. Thus $Tz\in K\cap(2y-K)$ and $Tz\in\mathcal{D}_{g}^{\sqrt{\bar{\nu}}}(y)$. Due to \mathrm{D}^{2}\psi(x)[(z-x)^{\otimes2}]=\mathrm{D}^{2}\phi(y)[{\bigl(A(z-x)\bigr)}^{\otimes2}]=\mathrm{D}^{2}\phi(y)[(Tz-y)^{\otimes2}]\leq\bar{\nu}\,, it follows that $\psi$ is also $\bar{\nu}$-symmetric. For the second part, observe that $\mathrm{D}^{4}\psi(x)[v,v,h,h]=\mathrm{D}^{4}\phi(y)[Av,Av,Ah,Ah]\geq0$ for any $v,h\in\mathbb{R}^{d}$. The third part can be proven similarly. Next is a matrix version. Let $\phi$ be a $\nu$-self-concordant function counterpart of $g$. Then $\psi(x):=\phi(Tx)$ defined on $\textup{int}(\bar{K})$ is $\nu$-self-concordant by Lemma \ref{['lem:linear-trans']}. For any $h\in\mathbb{R}^{d}$ and $y:=Tx$, we have $\mathrm{D}\bar{g}(x)[h]=A^{\mathsf{T}}\mathrm{D} g(y)[Ah]\,A\preceq2{\|Ah\|}_{g(y)}\,A^{\mathsf{T}}g(y)A=2{\|h\|}_{\bar{g}(x)}\,\bar{g}(x)\,.$ Consider a sequence $\{x_{n}\}\subset\bar{K}$ converging to a boundary point $x\in\partial\bar{K}$. If $Tx\notin\partial K$, then $Tx\in\textup{int}(K)$, and the continuity of $T$ implies $x$ is also in $\textup{int}(\bar{K})$. Thus, $Tx\in\partial K$ and $\psi(x_{n})=\phi(Tx_{n})\to\phi(Tx)=\infty$. Lastly, $\nabla^{2}\phi\asymp g$ leads to $\nabla^{2}\psi=A^{\mathsf{T}}\nabla^{2}\phi\,A\asymp A^{\mathsf{T}}gA=\bar{g}$, and $\bar{g}$ is $\nu$-self-concordant for $\bar{K}$. As for symmetry, since $\bar{g}$ is self-concordant, $\mathcal{D}_{\bar{g}}^{1}(x)\subset\bar{K}\cap(2x-\bar{K})$ for $x\in\textup{int}(\bar{K})$ by Lemma \ref{['lem:dikin-in-body']}. For $z\in\bar{K}\cap(2x-\bar{K})$, as $Tz\in K\cap(2Tx-K)$ holds, it follows that $\bar{\nu}\geq{\|Tz-Tx\|}_{g(y)}^{2}={\|z-y\|}_{A^{\mathsf{T}}g(y)A}^{2}={\|z-y\|}_{\bar{g}(x)}^{2}\,,$ and thus $\bar{g}$ is $\bar{\nu}$-symmetric. As for the second item, we first show that $\bar{g}$ is collapsed onto $W=\textup{row}(A)$ (i.e., $\bar{g}=P_{W}\bar{g}P_{W}$ for the orthogonal projection $P_{W}$ onto $W$). To see this, observe that P_{W}\bar{g}P_{W}=P_{W}A^{\mathsf{T}}gAP_{W}=A^{\mathsf{T}}(AA^{\mathsf{T}})^{\dagger}A\cdot A^{\mathsf{T}}gA\cdot A^{\mathsf{T}}(AA^{\mathsf{T}})^{\dagger}A\,, and due to $AA^{\mathsf{T}}(AA^{\mathsf{T}})^{\dagger}A=AA^{\mathsf{T}}(A^{\mathsf{T}})^{\dagger}A^{\dagger}A=AA^{\dagger}A=A$, we have $P_{W}\bar{g}P_{W}=A^{\mathsf{T}}gA=\bar{g}$. We now show that $\bar{g}$ is SSC along $W$. For $k:=\dim(W)$, take $U\in\mathbb{R}^{d\times k}$ whose columns form an orthonormal basis of $W$. It suffices to show that $g_{W}:=U^{\mathsf{T}}\bar{g}U=U^{\mathsf{T}}A^{\mathsf{T}}gAU=M^{\mathsf{T}}gM$ for $M:=AU\in\mathbb{R}^{m\times k}$ is SSC. First of all, we can check PDness of $g_{W}$ as follows: Suppose $g_{W}v=0$ for some $v\in\mathbb{R}^{k}$. Then $0={\|v\|}_{g_{W}}={\|g^{1/2}Mv\|}_{2}$ and $AUv=Mv=0$. Since $Uv\in\textup{row}(A)\cap\textsf{ker}(A)$ and $U$ is full-rank, we have $v=0$. Next, for $h\in\mathbb{R}^{k}$ and $x\in\textup{int}(\bar{K})$ \textup{Tr}{\bigl(g_{W}(x)^{-1}\mathrm{D} g_{W}(x)[h]\,g_{W}(x)^{-1}\mathrm{D} g_{W}(x)[h]\bigr)}=\textup{Tr}{\Bigl({\bigl(g^{\frac{1}{2}}M(M^{\mathsf{T}}gM)^{-1}M^{\mathsf{T}}g^{\frac{1}{2}}\cdot g^{-\frac{1}{2}}\mathrm{D} g(Tx)[Ah]\,g^{-\frac{1}{2}}\bigr)}^{2}\Bigr)}\underset{\text{(i)}}{\leq}\textup{Tr}{\Bigl({\bigl(g^{-\frac{1}{2}}\mathrm{D} g(Tx)[Ah]\,g^{-\frac{1}{2}}\bigr)}^{2}\Bigr)}\leq{\|g^{-\frac{1}{2}}\mathrm{D} g(Tx)[Ah]\,g^{-\frac{1}{2}}\|}_{F}^{2}\leq4{\|Ah\|}_{g(Tx)}^{2}=4{\|h\|}_{\bar{g}(x)}^{2}\,, where in (i) we used $P(g^{\frac{1}{2}}M)=g^{\frac{1}{2}}M(M^{\mathsf{T}}gM)^{-1}M^{\mathsf{T}}g^{\frac{1}{2}}\preceq I$. Thus, $\bar{g}$ is SSC along $W=\textup{row}(A)$. The third item immediately follows from $\mathrm{D}^{2}\bar{g}(x)[h,h]=A^{\mathsf{T}}\mathrm{D}^{2}g(y)[Ah,Ah]\,A\succeq0$ for any $h\in\mathbb{R}^{d}$. As for the fourth item, for any PSD matrix function $g'$ on $\bar{K}$ we have \textup{Tr}{\bigl((g'+\bar{g})^{-1}\mathrm{D}^{2}\bar{g}[h,h]\bigr)}=\textup{Tr}{\Bigl((g'+A^{\mathsf{T}}gA)^{-1}A^{\mathsf{T}}\mathrm{D}^{2}g[Ah,Ah]\,A\Bigr)}=\textup{Tr}{\Bigl((A^{-\mathsf{T}}g'A^{-1}+g)^{-1}\mathrm{D}^{2}g[Ah,Ah]\Bigr)}\geq-{\|Ah\|}_{g}^{2}=-{\|h\|}_{\bar{g}}^{2}\,. The last item is straightforward to check by the change of variable. In passing SSC to an augmented space, the Woodbury matrix identity is a main technical tool used: for matrices with compatible sizes $(I+UV)^{-1}=I-U\,(I+VU)^{-1}V\,.$ Using this, we show that if $g\in\mathbb{S}_{++}^{d}$ is SSC, then $\bar{g}+\varepsilon I_{m}$ is SSC. Fix $\varepsilon>0,y\in\textup{int}(K')$, and $h\in\mathbb{R}^{m}$. Take a projection matrix $P\in\{0,1\}^{d\times m}$ such that $PP^{\mathsf{T}}=I_{d}$ and $\bar{g}(y)=P^{\mathsf{T}}g(Py)P$ for $x=Py\in\textup{int}(K)$. Also for $k:=\dim(W)$, take a matrix $U\in\mathbb{R}^{d\times k}$ whose columns form an orthonormal basis of $W$. Then $\bar{g}(y)=P^{\mathsf{T}}g(Py)P$ and $g(x)=Ug_{W}(x)U$, so for $M:=U^{\mathsf{T}}P\in\mathbb{R}^{k\times m}$, $\bar{g}(y)=P^{\mathsf{T}}Ug_{W}(Py)U^{\mathsf{T}}P=M^{\mathsf{T}}g_{W}(Py)M\,.$ Note that $MM^{\mathsf{T}}=I_{k}$. Thus, {\|(\bar{g}(y)+\varepsilon I)^{-\frac{1}{2}}\mathrm{D}(\bar{g}+\varepsilon I)(y)[h]\,(\bar{g}(y)+\varepsilon I)^{-\frac{1}{2}}\|}_{F}^{2}=\textup{Tr}{\Bigl({\bigl((\bar{g}(y)+\varepsilon I)^{-1}\mathrm{D}\bar{g}(y)[h]\bigr)}^{2}\Bigr)}=\textup{Tr}{\Bigl({\bigl(M(M^{\mathsf{T}}g_{W}(x)\,M+\varepsilon I)^{-1}M^{\mathsf{T}}\cdot\mathrm{D} g_{W}(x)[Ph]\bigr)}^{2}\Bigr)}\underset{\text{(i)}}{=}\textup{Tr}{\Bigl({\bigl((g_{W}(x)+\varepsilon I_{k})^{-1}\mathrm{D} g_{W}(x)[Ph]\bigr)}^{2}\Bigr)}\leq{\|g_{W}(x)^{-\frac{1}{2}}\mathrm{D} g_{W}(x)[Ph]\,g_{W}(x)^{-\frac{1}{2}}\|}_{F}^{2}\leq4{\|Ph\|}_{g(x)}^{2}=4{\|h\|}_{\bar{g}(y)}^{2}\,, where in (i) we used the identity $M{\bigl(M^{\mathsf{T}}g_{W}(x)\,M+\varepsilon I\bigr)}^{-1}M^{\mathsf{T}}=(g_{W}(x)+\varepsilon I_{k})^{-1}$. To see this, we use the Woodbury matrix identity to get $(\varepsilon I_{m}+M^{\mathsf{T}}g_{W}M)^{-1}=\frac{1}{\varepsilon}I_{m}-\frac{1}{\varepsilon^{2}}M^{\mathsf{T}}g_{W}^{\frac{1}{2}}{\bigl(I_{k}+\frac{1}{\varepsilon}g_{W}\bigr)}^{-1}g_{W}^{\frac{1}{2}}M\,,$ and thus conjugating both sides by $M$ results in M{\bigl(M^{\mathsf{T}}g_{W}M+\varepsilon I_{m}\bigr)}^{-1}M^{\mathsf{T}}=\frac{1}{\varepsilon}I_{k}-\frac{1}{\varepsilon}g_{W}^{\frac{1}{2}}(g_{W}+\varepsilon I_{k})^{-1}g_{W}^{\frac{1}{2}}=\frac{1}{\varepsilon}I_{k}-\frac{1}{\varepsilon}(g_{W}+\varepsilon I_{k})^{-1}g_{W}\,. Then, the identity follows from (g_{W}+\varepsilon I_{k})\cdot{\bigl(\frac{1}{\varepsilon}I_{k}-\frac{1}{\varepsilon}\,(g_{W}+\varepsilon I_{k})^{-1}g_{W}\bigr)}=\frac{1}{\varepsilon}(g_{W}+\varepsilon I_{k})-\frac{1}{\varepsilon}g_{W}=I_{k}\,.\qedhere In extending SLTSC and SASC, we need two technical lemmas: the inverse of a block matrix and connection between P(S)Dness and Schur complements. If $D$ and its Schur complement $A-BD^{-1}C$ are invertible, then $\left[ABCD\right]^{-1}=\left[(A-BD^{-1}C)^{-1}***\right]\,.$ Let $A\in\mathbb{R}^{d\times d},B\in\mathbb{R}^{d\times m},C\in\mathbb{R}^{m\times m}$ and define a matrix $M\in\mathbb{R}^{(m+d)\times(m+d)}$ by $M=\left[ABB^{\mathsf{T}}D\right]\,.$ Then $M\succ0$ if and only if $A\succ0$ and $C-BA^{-1}B^{\mathsf{T}}\succ0$ if and only $C\succ0$ and $A-B^{\mathsf{T}}C^{-1}B\succ0$. Using these, we show that if $g$ is SLTSC and SASC, then $\bar{g}$ is SLTSC and SASC. Take a full row-rank projection matrix $P\in\{0,1\}^{d\times m}$ such that $\bar{g}(y)=P^{\mathsf{T}}g(Py)P$, where the rows of $P$ forms a subset of the canonical basis $\{e_{1},\dots,e_{m}\}$. We can augment the rows of $P$ with the rest of the canonical basis so that the augmented matrix $\bar{P}\in\mathbb{R}^{m\times m}$ is an orthonormal matrix. Then we can represent $\bar{g}$ by $\bar{g}(y)=\bar{P}^{\mathsf{T}}\left[g(Py)000\right]\bar{P}\,.$ Consider a PSD matrix function $g':\textup{int}(K')\to\mathbb{S}_{+}^{m}$ such that $g'+\bar{g}$ is PD on $K'$. Representing them in the block form with $g_{A}\in\mathbb{R}^{d\times d},g_{B}\in\mathbb{R}^{d\times(m-d)},$ and $g_{C}\in\mathbb{R}^{(m-d)\times(m-d)}$ $\bar{g}+g'=\bar{P}^{\mathsf{T}}\left(\left[g000\right]+\left[g_{A}g_{B}g_{B}^{\mathsf{T}}g_{C}\right]\right)\bar{P}=\bar{P}^{\mathsf{T}}\underbrace{\left[g+g_{A}g_{B}g_{B}^{\mathsf{T}}g_{C}\right]}_{\eqqcolon g^{*}}\bar{P}\,.$ Since $g^{*}$ is PD, $g_{C}$ and its Schur complement $(g+g_{A})-g_{B}g_{C}^{-1}g_{B}^{\mathsf{T}}$ are PD. Thus by Lemma \ref{['lem:block-inverse']}, $\left[g+g_{A}g_{B}g_{B}^{\mathsf{T}}g_{C}\right]^{-1}=\left[(g+g_{A}-g_{B}g_{C}^{-1}g_{B}^{\mathsf{T}})^{-1}***\right]\,.$ Hence, \textup{Tr}{\bigl((\bar{g}+g')^{-1}\mathrm{D}^{2}\bar{g}(y)[h,h]\bigr)}=\textup{Tr}\Biggl(\bar{P}^{\mathsf{T}}\left[g+g_{A}g_{B}g_{B}^{\mathsf{T}}g_{C}\right]^{-1}\bar{P}\bar{P}^{\mathsf{T}}\left[\mathrm{D}^{2}g(Py)[Ph,Ph]000\right]\bar{P}\Biggr)=\textup{Tr}\Biggl(\left[g+g_{A}g_{B}g_{B}^{\mathsf{T}}g_{C}\right]^{-1}\left[\mathrm{D}^{2}g(Py)[Ph,Ph]000\right]\Biggr)=\textup{Tr}{\bigl((g+\underbrace{g_{A}-g_{B}g_{C}^{-1}g_{B}^{\mathsf{T}}}_{\succeq0})^{-1}\,\mathrm{D}^{2}g(Py)[Ph,Ph]\bigr)}\ge-{\|Ph\|}_{g(Py)}^{2}=-{\|h\|}_{\bar{g}(y)}^{2}\,, where in the last inequality we used STLSC of $g$, since $g'\succeq0$ ensures that its Schur complement satisfies $g_{A}-g_{B}g_{C}^{-1}g_{B}^{\mathsf{T}}\succeq0$ by Lemma \ref{['lem:schur']}. For SASC, consider any PSD matrix function $g':\textup{int}(K')\to\mathbb{S}_{+}^{m}$. For $x=Py$ and $z_{x}=Pz_{y}\in\mathbb{R}^{d}$ with $z_{y}\sim\mathcal{N}{\bigl(y,\frac{r^{2}}{m}\,(\bar{g}+g)(y)^{-1}\bigr)}$, we have ${\|z_{y}-y\|}_{\bar{g}(z_{y})}^{2}-{\|z_{y}-y\|}_{\bar{g}(y)}^{2}={\|z_{x}-x\|}_{g(z_{x})}^{2}-{\|z_{x}-x\|}_{g(x)}^{2}\,.$ Also, $z_{x}-x=P\,(z_{y}-y)$ is a Gaussian with zero mean and covariance \frac{r^{2}}{m}\,P\,(\bar{g}+g')(y)^{-1}P^{\mathsf{T}}=\frac{r^{2}}{m}\,P\bar{P}^{\mathsf{T}}\left(\left[g000\right]+\left[g_{A}g_{B}g_{B}^{\mathsf{T}}g_{C}\right]\right)^{-1}\bar{P}\bar{P}^{\mathsf{T}}=\frac{r^{2}}{m}\,\left[I_{d}0_{d\times(m-d)}\right]\left(\left[g000\right]+\left[g_{A}g_{B}g_{B}^{\mathsf{T}}g_{C}\right]\right)^{-1}\left[I_{d}0_{d\times(m-d)}\right]=\frac{r^{2}}{m}\,(g+g_{A}-g_{B}g_{C}^{-1}g_{B}^{\mathsf{T}})^{-1}\,. Since $g_{A}-g_{B}g_{C}^{-1}g_{B}^{\mathsf{T}}\succeq0$ due to $g'\succeq0$, it holds that $g_{0}:=\frac{m-d}{d}g+\frac{m}{d}(g_{A}-g_{B}g_{C}^{-1}g_{B}^{\mathsf{T}})$ on $\textup{int}(K)$ is PSD. Now, it suffices to check that the covariance matrix above is equal to $\frac{r^{2}}{d}(g+g_{0})^{-1}$: $\frac{d}{r^{2}}\,(g+g_{0})=\frac{d}{r^{2}}{\Bigl(g+\frac{m-d}{d}\,g+\frac{m}{d}\,(g_{A}-g_{B}g_{C}^{-1}g_{B}^{\mathsf{T}})\Bigr)}\frac{m}{r^{2}}\,(g+g_{A}-g_{B}g_{C}^{-1}g_{B}^{\mathsf{T}})\,.\qedhere$ We show that if $g_{i}\in\mathbb{S}_{++}^{d_{i}}$ is SC, then $g=\sum d_{i}\bar{g}_{i}$ is SSC. Note that $d_{i}g_{i}$ is SSC for $i=1,\dots,m$. For $x\in\prod E_{i}$ and $h=(h_{1},\dots,h_{m})\in\mathbb{R}^{l}$ with $h_{i}\in\mathbb{R}^{d_{i}}$, we have {\|g(x)^{-\frac{1}{2}}\mathrm{D} g(x)[h]\,g(x)^{-\frac{1}{2}}\|}_{F}^{2}=\left\Vert \left[g_{1}(x_{1})^{-\frac{1}{2}}\mathrm{D} g_{1}(x_{1})[h_{1}]\,g_{1}(x_{1})^{-\frac{1}{2}}\ddotsg_{m}(x_{m})^{-\frac{1}{2}}\mathrm{D} g_{m}(x_{m})[h_{m}]\,g_{m}(x_{m})^{-\frac{1}{2}}\right]\right\Vert _{F}^{2}=\sum_{i}{\|g_{i}(x_{i})^{-\frac{1}{2}}\mathrm{D} g_{i}(x_{i})[h_{i}]\,g_{i}(x_{i})^{-\frac{1}{2}}\|}_{F}^{2}\leq4\sum_{i}{\|h_{i}\|}_{d_{i}g_{i}(x_{i})}^{2}=4{\|h\|}_{g(x)}^{2}\,.\qedhere Next, we show that if $g_{i}\in\mathbb{S}_{++}^{d_{i}}$ is HSC, then $g=\sum d_{i}\bar{g}_{i}$ is SLTSC. For $h=(h_{1},\dots,h_{m})$ and any PSD matrix function $g'$, we have \textup{Tr}{\bigl((g'+g)^{-1}\mathrm{D}^{2}g[h^{\otimes2}]\bigr)}=\sum_{i}\textup{Tr}{\bigl((g'+(g-d_{i}\bar{g}_{i})+d_{i}\bar{g}_{i})^{-1}\mathrm{D}^{2}(d_{i}\bar{g}_{i})[h^{\otimes2}]\bigr)}\gtrsim-\sum_{i}{\|h\|}_{d_{i}\bar{g}_{i}}^{2}=-{\|h\|}_{g}^{2}\,, where we used Lemma \ref{['lem:hsc-to-sltsc']} in the inequality. Since $\mathcal{A}$ is $(R(G),\beta),\gamma)$-compatible with $\Gamma$, the first two claims immediately follow from nesterov1994interior. Let $x\in G^{+}$ and $h\in\mathbb{R}^{d}$. Define the following notations: u=\mathrm{D}\mathcal{A}(x)[h],\quad v=\mathrm{D}^{2}\mathcal{A}(x)[h^{\otimes2}],\quad w=\mathrm{D}^{3}\mathcal{A}(x)[h^{\otimes3}],\quad z=\mathrm{D}^{4}\mathcal{A}(x)[h^{\otimes4}],s=\sqrt{\mathrm{D} F(y)[v]},\quad\rho=\sqrt{\mathrm{D}^{2}\Pi(x)[h^{\otimes2}]},\quad r=\sqrt{\mathrm{D}^{2}F(y)[u^{\otimes2}]}\,. From direct computations, we have \mathrm{D}^{2}\Psi(x)[h^{\otimes2}]=\mathrm{D} F(y)[v]+\mathrm{D}^{2}F(y)[u^{\otimes2}]+\delta^{2}\mathrm{D}^{2}\Pi(x)[h^{\otimes2}]=s^{2}+r^{2}+\delta^{2}\rho^{2}\,,\mathrm{D}^{3}\Psi(x)[h^{\otimes3}]=\mathrm{D} F(y)[w]+3\mathrm{D}^{2}F(y)[u,v]+\mathrm{D}^{3}F(y)[u^{\otimes3}]+\delta^{2}\mathrm{D}^{3}\Pi(x)[h^{\otimes3}]\,,\mathrm{D}^{4}\Psi(x)[h^{\otimes4}]=\mathrm{D}^{2}F(y)[w,u]+\mathrm{D} F(y)[z]+3\mathrm{D}^{3}F(y)[u,u,v]+3\mathrm{D}^{2}F(y)[v^{\otimes2}]\qquad+3\mathrm{D}^{2}F(y)[u,w]+\mathrm{D}^{4}F(y)[u^{\otimes4}]+3\mathrm{D}^{3}F(y)[u,u,v]+\delta^{2}\mathrm{D}^{4}\Pi(x)[h^{\otimes4}]=\mathrm{D} F(y)[z]+3\mathrm{D}^{2}F(y)[v^{\otimes2}]+4\mathrm{D}^{2}F(y)[u,w]\qquad+6\mathrm{D}^{3}F(y)[u,u,v]+\mathrm{D}^{4}F(y)[u^{\otimes4}]+\delta^{2}\mathrm{D}^{4}\Pi(x)[h^{\otimes4}]\,. HSC of $F$ and $\Pi$ implies that $|\mathrm{D}^{4}\Pi(x)[h^{\otimes4}]|\leq6\rho^{4}\,,\qquad\text{and}\qquad|\mathrm{D}^{4}F(y)[u^{\otimes4}]|\leq6r^{4}\,.$ Since $\mathcal{A}$ is $(K,\beta,\gamma)$-compatible and $K\subset R(G)$, Lemma \ref{['lem:extension-compatibility']}-1 implies concavity of $\mathcal{A}$ with respect to $R(G)$, which means $-v\geq_{R(G)}0$. Then, nesterov1994interior ensures $\sqrt{\mathrm{D}^{2}F(y)[v^{\otimes2}]}\leq\mathrm{D} F(y)[v]=s^{2}\,.$ Hence, $|3\mathrm{D}^{2}F(y)[v,v]|\leq3(\mathrm{D} F(y)[v])^{2}=3s^{4}$, and self-concordance of $F$ results in $|6\mathrm{D}^{3}F(y)[u,u,v]|\leq12r^{2}\sqrt{\mathrm{D}^{2}F(y)[v,v]}\leq12r^{2}s^{2}\,.$ Since $\{h:h^{\mathsf{T}}\Pi(x)h\leq1\}$ is contained in $\Gamma\cap(2x-\Gamma)$, compatibility of $\mathcal{A}$ leads to $\beta\mathrm{D}^{2}\mathcal{A}(x){\Bigl[{\Bigl(\frac{h}{{\|h\|}_{\Pi(x)}}\Bigr)}^{\otimes2}\Bigr]}\leq_{K}\mathrm{D}^{3}\mathcal{A}(x){\Bigl[{\Bigl(\frac{h}{{\|h\|}_{\Pi(x)}}\Bigr)}^{\otimes3}\Bigr]}\leq_{K}-\beta\mathrm{D}^{2}\mathcal{A}(x){\Bigl[{\Bigl(\frac{h}{{\|h\|}_{\Pi(x)}}\Bigr)}^{\otimes2}\Bigr]}\,,$ and thus $\beta\rho v\leq_{K}w\leq_{K}-\beta\rho v$. As $K$ is a ray, $\mathrm{D}^{2}F(y)[w,w]\leq\beta^{2}\rho^{2}\mathrm{D}^{2}F(y)[v,v]\leq\beta^{2}\rho^{2}s^{4}$. Thus, $|4\mathrm{D}^{2}F(y)[u,w]|\leq4\sqrt{\mathrm{D}^{2}F(y)[u,u]}\sqrt{\mathrm{D}^{2}F(y)[w,w]}\leq4r\beta\rho s^{2}\,.$ Lastly, since $\gamma v\rho^{2}\leq_{K}z\leq_{K}-\gamma v\rho^{2}$ and $K$ is a ray, we have $|\mathrm{D} F(y)[z]|\leq3\gamma\rho^{2}|\mathrm{D} F(y)[v]|=3\gamma\rho^{2}s^{2}\,.$ Putting these together, \left\lvert \mathrm{D}^{4}\Psi(x)[h^{\otimes4}]\right\rvert \leq3\gamma\rho^{2}s^{2}+4r\beta\rho s^{2}+12r^{2}s^{2}+3s^{4}+6\delta^{2}\rho^{4}+6r^{4}\leq6(\delta^{2}\rho^{4}+r^{4}+s^{4}+r^{2}s^{2}+\delta\rho^{2}s^{2}+\delta r\rho s^{2})\leq6{\bigl((\delta\rho)^{4}+r^{4}+s^{4}+r^{2}s^{2}+(\delta\rho)^{2}s^{2}+r^{2}s^{2}+(\delta\rho)^{2}s^{2}\bigr)}\leq6{\bigl((\delta\rho)^{2}+r^{2}+s^{2}\bigr)}^{2}=6{\bigl(\mathrm{D}^{2}\Psi(x)[h,h]\bigr)}^{2}\,.\qedhere We relate SSC and symmetry to well-studied terms in the field of optimization, such as $\max_{i}\frac{[\sigma(\sqrt{D_{x}}A_{x})]_{i}}{[D_{x}]_{ii}}$ and ${\|D_{x,h}'\|}_{D_{x}^{-1}}^{2}$. Let us write $g(x)=A_{x}^{\mathsf{T}}D_{x}A_{x}=A^{\mathsf{T}}V_{x}A$ for $V_{x}:=S_{x}^{-1}D_{x}S_{x}^{-1}$. By Claim \ref{['claim:diffLogBarrier']}, \mathrm{D} g(x)[h]=A^{\mathsf{T}}(-2S_{x}^{-1}S_{x,h}S_{x}^{-1}D_{x}+S_{x}^{-1}\mathrm{D} D_{x}[h]\,S_{x}^{-1})A=A^{\mathsf{T}}V_{x}^{1/2}\overline{D}_{x}V_{x}^{1/2}A\,, where $\overline{D}_{x}:=-2S_{x,h}+D_{x}^{-1}\mathrm{D} D_{x}[h]$. Using this, {\|(g'+g)^{-\frac{1}{2}}\mathrm{D} g[h]\,(g'+g)^{-\frac{1}{2}}\|}_{F}^{2}=\textup{Tr}{\bigl((g'+g)^{-1}A^{\mathsf{T}}V_{x}^{1/2}\overline{D}_{x}\underbrace{V_{x}^{1/2}A(g'+g)^{-1}A^{\mathsf{T}}V_{x}^{1/2}}_{=:P_{x}'}\overline{D}_{x}V_{x}^{1/2}A\bigr)}=\textup{Tr}(P_{x}'\overline{D}_{x}P_{x}'\overline{D}_{x})\,. By Lemma \ref{['lem:matrix-projection']}, we have $P_{x}'\preceq P_{x}=P(V_{x}^{1/2}A)=P(D_{x}^{1/2}A_{x})$, and thus \textup{Tr}(P_{x}'\overline{D}_{x}P_{x}'\overline{D}_{x})\leq\textup{Tr}(P_{x}\overline{D}_{x}P_{x}\overline{D}_{x})\underset{\text{(i)}}{=}\textsf{diag}(\overline{D}_{x})^{\mathsf{T}}P_{x}^{(2)}\,\textsf{diag}(\overline{D}_{x})\underset{\text{(ii)}}{\leq}\textsf{diag}(\overline{D}_{x})^{\mathsf{T}}\Sigma_{x}\,\textsf{diag}(\overline{D}_{x})\underset{\text{(iii)}}{\leq}4\sum_{i=1}^{m}[\sigma(D_{x}^{1/2}A_{x})]_{i}\,{\bigl((A_{x}h)_{i}^{2}+(D_{x}^{-1}\mathrm{D} D_{x}[h])_{i}^{2}\bigr)}\leq4\max_{i}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}\cdot\sum_{i=1}^{m}[D_{x}]_{ii}\,{\bigl((A_{x}h)_{i}^{2}+(D_{x}^{-1}\mathrm{D} D_{x}[h])_{i}^{2}\bigr)}\underset{\text{(iv)}}{=}4\max_{i}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}\cdot{\bigl({\|h\|}_{g(x)}^{2}+\sum_{i=1}^{m}[D_{x}^{-1}]_{ii}(\mathrm{D} D_{x}[h])_{i}^{2}\bigr)}\,, where (i) holds due to $x^{\mathsf{T}}(A\circ B)y=\textup{Tr}{\bigl(\textup{Diag}(x)A\textup{Diag}(y)B^{\mathsf{T}}\bigr)}$ (Lemma \ref{['lem:Hadamard']}), (ii) follows from $P_{x}^{(2)}\preceq\Sigma_{x}$ (Claim \ref{['claim:schurProjection']}), (iii) uses $(a+b)^{2}\leq2\left(a^{2}+b^{2}\right)$ for $a,b\in\mathbb{R}$ and $\Sigma_{x}=\textup{Diag}(P_{x})=\sigma(D_{x}^{1/2}A_{x})$, and (iv) holds due to $\sum_{i=1}^{m}[D_{x}]_{ii}\,(A_{x}h)_{i}^{2}=h^{\mathsf{T}}A_{x}^{\mathsf{T}}D_{x}A_{x}h=h^{\mathsf{T}}g(x)h$. As for the second claim, \max_{h:{\|h\|}_{g(x)}=1}{\|A_{x}h\|}_{\infty}=\max_{h}\max_{i\in[m]}\left\lvert \frac{a_{i}^{\mathsf{T}}h}{s_{i}}\right\rvert =\max_{i\in[m]}\max_{u:{\|u\|}_{2}=1}\left\lvert \frac{a_{i}^{\mathsf{T}}g(x)^{-1/2}u}{s_{i}}\right\rvert=\max_{i\in[m]}\left\Vert g(x)^{-1/2}\frac{a_{i}}{s_{i}}\right\Vert _{2}=\max_{i\in[m]}\sqrt{\frac{1}{s_{i}^{2}}a_{i}^{\mathsf{T}}g(x)^{-1}a_{i}}=\sqrt{\max_{i\in[m]}e_{i}^{\mathsf{T}}A_{x}g(x)^{-1}A_{x}^{\mathsf{T}}e_{i}}=\sqrt{\max_{i\in[m]}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}}\,. As for the last claim, for $h\in\mathbb{R}^{d}$ such that ${\|A_{x}h\|}_{\infty}\leq1$ (i.e., $h\in K\cap(2x-K)$ for $K=\{Ax\geq b\}$ due to Lemma \ref{['lem:symmforPolytope']}) we have h^{\mathsf{T}}g(x)h=h^{\mathsf{T}}A_{x}^{\mathsf{T}}D_{x}A_{x}h=\sum_{i=1}^{m}(D_{x})_{ii}(A_{x}h)_{i}^{2}\leq{\|A_{x}h\|}_{\infty}^{2}\sum_{i=1}^{m}(D_{x})_{ii}\leq\textup{Tr}(D_{x})\,.\qedhere Now we establish SSC and compute the symmetry parameters of metrics of the form $A_{x}^{\mathsf{T}}D_{x}A_{x}$: Logarithmic barrier: To show that $g$ is SSC along $\textup{row}(A)$, consider a self-concordant matrix $g(y)=S_{y}^{-2}=-\nabla_{y}^{2}(\sum_{i=1}^{m}\log y_{i})$ defined on $\{y\in\mathbb{R}^{m}:y\geq0\}$. By putting $D_{x}=I_{m}$ and $A_{x}=S_{x}^{-1}$ into Lemma \ref{['lem:helper4Diagonal']}-1, since $\sigma(A_{x})\leq1$ ${\|g(x)^{-\frac{1}{2}}\mathrm{D} g(x)[h]\,g(x)^{-\frac{1}{2}}\|}_{F}\leq2{\Bigl(\max_{i\in[m]}\sigma(A_{x})_{i}\Bigr)}^{1/2}\,{\|h\|}_{g(x)}\leq2{\|h\|}_{g(x)}\,.$ Through the linear map $Tx=Ax-b=y$, we recover $g(x)=\nabla^{2}\phi_{\log}(x)=A^{\mathsf{T}}S_{y}^{-2}A=A_{x}^{\mathsf{T}}A_{x}$, which is SSC along $\textup{row}(A)$ by Lemma \ref{['lem:linear-trans-matrix']}. For the $\bar{\nu}$-symmetry, the first part (i.e., $\mathcal{D}_{g}^{1}(x)\subset K\cap(2x-K)$) follows from Lemma \ref{['lem:symmetricLeftpart']}. The second part is immediate from $\bar{\nu}=\textup{Tr}(I_{m})=m$ and Lemma \ref{['lem:helper4Diagonal']}-3. Approximate volumetric barrier: For $D_{x}=\Sigma_{x}=\Sigma(A_{x})$, by Lemma \ref{['lem:usefulFactLewis']}-1 and 3 with $p=2$, \max_{i}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}\leq2\sqrt{m}\,,\quad\text{and}\quad\sum_{i=1}^{m}[D_{x}^{-1}]_{ii}\,(\mathrm{D} D_{x}[h])_{i}^{2}={\|\Sigma_{x}^{-1}\textsf{diag}(\mathrm{D}\Sigma_{x}[h])\|}_{\Sigma_{x}}^{2}\leq4{\|h\|}_{g(x)}^{2}\,. Using Lemma \ref{['lem:helper4Diagonal']}-1, {\|g(x)^{-\frac{1}{2}}\mathrm{D} g(x)[h]\,g(x)^{-\frac{1}{2}}\|}_{F}^{2}\leq4\max_{i}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}\,{\bigl({\|h\|}_{g(x)}^{2}+\sum_{i=1}^{m}[D_{x}^{-1}]_{ii}(\mathrm{D} D_{x}[h])_{i}^{2}\bigr)}\leq40\sqrt{m}{\|h\|}_{g(x)}^{2}\,. For the $\bar{\nu}$-symmetry, ${\|A_{x}(y-x)\|}_{\infty}^{2}\leq\max_{i\in[m]}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}\leq2m^{1/2}$ for $y\in\mathcal{D}_{g}^{1}(x)$ by Lemma \ref{['lem:helper4Diagonal']}-2. Also, Lemma \ref{['lem:helper4Diagonal']}-3 implies that $y$ with ${\|A_{x}(y-x)\|}_{\infty}\leq1$ is contained in $\mathcal{D}_{g}^{\sqrt{\textup{Tr}(D_{x})}}(x)$, where $\textup{Tr}(D_{x})=\textup{Tr}(P_{x})\leq d$. Therefore, $\tilde{g}(x):=40\sqrt{m}g(x)=40\sqrt{m}A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}$ is SSC with the symmetry parameter $\bar{\nu}=\mathcal{O}(\sqrt{m}d)$. Vaidya metric: Consider the metric without scaling: $g(x):=A_{x}^{\mathsf{T}}D_{x}A_{x}$ with $D_{x}=\Sigma_{x}+\frac{d}{m}I_{m}$. Then, using anstreicher1997volumetric in (i) below \max_{i}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}\underset{\text{Lemma }\ref{['lem:helper4Diagonal']}\text{-2}}{=}{\Bigl(\max_{h\in\mathbb{R}^{d}}\frac{{\|A_{x}h\|}_{\infty}}{{\|h\|}_{g(x)}}\Bigr)}^{2}\underset{\text{(i)}}{\leq}\sqrt{\frac{m}{d}}\,,\sum_{i=1}^{m}[D_{x}^{-1}]_{ii}\,(\mathrm{D} D_{x}[h])_{i}^{2}\underset{\text{(ii)}}{\leq}\sum_{i=1}^{m}[\Sigma_{x}^{-1}]_{ii}(\mathrm{D}\Sigma_{x}[h])_{i}^{2}\underset{\text{Lemma }\ref{['lem:usefulFactLewis']}\text{-3}}{\leq}4h^{\mathsf{T}}A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}h\leq4{\|h\|}_{g(x)}^{2}\,. Putting these back to Lemma \ref{['lem:helper4Diagonal']}-1, {\|g(x)^{-\frac{1}{2}}\mathrm{D} g(x)[h]\,g(x)^{-\frac{1}{2}}\|}_{F}^{2}\leq4\max_{i}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}\,{\bigl({\|h\|}_{g(x)}^{2}+\sum_{i=1}^{m}[D_{x}^{-1}]_{ii}(\mathrm{D} D_{x}[h])_{i}^{2}\bigr)}\leq20\sqrt{\frac{m}{d}}{\|h\|}_{g(x)}^{2}\,. Thus, $\tilde{g}(x):=22\sqrt{\frac{m}{d}}g(x)=22\sqrt{\frac{m}{d}}A_{x}^{\mathsf{T}}{\bigl(\Sigma_{x}+\frac{d}{m}I_{m}\bigr)}A_{x}$ is SSC. For the $\bar{\nu}$-symmetry, Lemma \ref{['lem:helper4Diagonal']}-2 implies that for $y\in\mathcal{D}_{g}^{1}(x)$, ${\|A_{x}(y-x)\|}_{\infty}^{2}\leq\max_{i}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}\underset{\ref{['eq:28-1']}}{\leq}\sqrt{\frac{m}{d}}\,.$ Also, Lemma \ref{['lem:helper4Diagonal']}-3 implies that $y$ with ${\|A_{x}(y-x)\|}_{\infty}\leq1$ is contained in $\mathcal{D}_{g}^{\sqrt{\textup{Tr}(D_{x})}}(x)$, where $\textup{Tr}(D_{x})=\textup{Tr}{\bigl(\Sigma_{x}+\frac{d}{m}I_{m}\bigr)}=\textup{Tr}(\Sigma_{x})+d\leq2d\,.$ Therefore, $\tilde{g}(x)$ satisfies $\mathcal{D}_{\tilde{g}}^{1}(x)\subset K\cap(2x-K)\subset\mathcal{D}_{\tilde{g}}^{\sqrt{44(md)^{1/2}}}(x)$, so $\tilde{g}$ is $\mathcal{O}(\sqrt{md})$-symmetric. Lewis-weight metric: Consider the unscaled version first: $g(x)=A_{x}^{\mathsf{T}}W_{x}A_{x}$. By Lemma \ref{['lem:helper4Diagonal']}-1 {\|g(x)^{-\frac{1}{2}}\mathrm{D} g(x)[h]\,g(x)^{-\frac{1}{2}}\|}_{F}^{2}\leq4\max_{i}\frac{[\sigma(W_{x}^{1/2}A_{x})]_{i}}{[W_{x}]_{ii}}\,{\bigl({\|h\|}_{g(x)}^{2}+\sum_{i=1}^{m}[W_{x}^{-1}]_{ii}(\mathrm{D} W_{x}[h])_{i}^{2}\bigr)}\underset{\text{(i)}}{\leq}8m^{\frac{2}{p+2}}{\bigl({\|h\|}_{g(x)}^{2}+p^{2}\,{\|h\|}_{g(x)}^{2}\bigr)}\leq{\bigl(8m^{\frac{2}{p+2}}(1+p^{2})\bigr)}\,{\|h\|}_{g(x)}^{2}\,, where in (i) we used Lemma \ref{['lem:usefulFactLewis']}-1 and 3. For the first part of the $\bar{\nu}$-symmetry, Lemma \ref{['lem:helper4Diagonal']}-2 implies that $\max_{h:{\|h\|}_{g(x)}=1}{\|A_{x}h\|}_{\infty}=\sqrt{\max_{i}\frac{[\sigma(W_{x}^{1/2}A_{x})]_{i}}{[W_{x}]_{ii}}}\leq\sqrt{2m^{\frac{2}{p+2}}}\,,$ and Lemma \ref{['lem:helper4Diagonal']}-3 leads to $K\cap(2x-K)\subset\mathcal{D}_{g}^{\sqrt{d}}(x)$ due to $\textup{Tr}(W_{x})=\textup{Tr}{\bigl(W_{x}^{\frac{1}{2}-\frac{1}{p}}A_{x}(A_{x}^{\mathsf{T}}W_{x}^{1-\frac{2}{p}}A_{x})^{-1}A_{x}^{\mathsf{T}}W_{x}^{\frac{1}{2}-\frac{1}{p}}\bigr)}=\textup{Tr}{\bigl(A_{x}^{\mathsf{T}}W_{x}^{1-\frac{2}{p}}A_{x}(A_{x}^{\mathsf{T}}W_{x}^{1-\frac{2}{p}}A_{x})^{-1}\bigr)}=d\,.$ Therefore, $16p^{2}m^{\frac{2}{p+2}}A_{x}^{\mathsf{T}}W_{x}A_{x}$ is SSC with $\mathcal{O}{\bigl(dm^{\frac{2}{p+2}}\bigr)}$-symmetry by Lemma \ref{['lem:symmforPolytope']}. By setting $p=\mathcal{O}(\log m)$, the claim follows. Let $\theta_{1}(x):=A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}$, $\theta_{2}(x):=A_{x}^{\mathsf{T}}A_{x}$, and $\Gamma_{x}:=\textup{Diag}{\bigl(A_{x}g(x)^{-1}A_{x}^{\mathsf{T}}\bigr)}$. Recall $g=g_{1}+g_{2}$ for a PSD matrix function $g_{1}$ and the Vaidya metric $g_{2}$. ${\|\Gamma_{x}\|}_{\infty}\leq\frac{1}{44}$. For $\overline{g}_{2}:=\theta_{1}+\frac{d}{m}\theta_{2}=\frac{1}{44}\sqrt{\frac{d}{m}}g_{2}$, it follows from $g^{-1}\preceq g_{2}^{-1}=\frac{1}{44}\sqrt{\frac{d}{m}}\overline{g}_{2}^{-1}$ that 44{\|\Gamma_{x}\|}_{\infty}\leq4\sqrt{\frac{d}{m}}{\|\textup{Diag}(A_{x}\overline{g}_{2}^{-1}A_{x}^{\mathsf{T}})\|}_{\infty}=\sqrt{\frac{d}{m}}\max_{i\in[m]}\frac{{\bigl[\sigma{\bigl(\sqrt{\Sigma_{x}+\frac{d}{m}I_{m}}A_{x}\bigr)}\bigr]}_{i}}{{\bigl[\Sigma_{x}+\frac{d}{m}I_{m}\bigr]}_{ii}}\underset{\ref{['eq:28-1']}}{\leq}1\,.\qedhere Now we show SLTSC of the Vaidya metric: As $\mathrm{D}^{2}\theta_{2}(x)[h,h]\succeq0$ by Claim \ref{['claim:diffLogBarrier']}, we have $\textup{Tr}{\bigl(g^{-1}\mathrm{D}^{2}\theta_{2}(x)[h,h]\bigr)}=\textup{Tr}{\bigl(g^{-\frac{1}{2}}\mathrm{D}^{2}\theta_{2}(x)[h,h]g^{-\frac{1}{2}}\bigr)}\geq0\,.$ As for $\theta_{1}$, by Lemma \ref{['lem:calculusLeverage']}-6 $\mathrm{D}^{2}\theta_{1}[h,h]\succeq-16A_{x}^{\mathsf{T}}\textup{Diag}(S_{x,h}P_{x}S_{x,h}P_{x})A_{x}-6A_{x}^{\mathsf{T}}\textup{Diag}(P_{x}S_{x,h}^{2}P_{x})A_{x}$, so $\textup{Tr}{\bigl(g^{-1}\mathrm{D}^{2}\theta_{1}(x)[h,h]\bigr)}\geq-16\textup{Tr}(\Gamma_{x}S_{x,h}P_{x}S_{x,h}P_{x})-6\textup{Tr}(\Gamma_{x}P_{x}S_{x,h}^{2}P_{x})\,.$ We first note that $\textup{Tr}(S_{x,h}P_{x}S_{x,h})=s_{x,h}^{\mathsf{T}}(P_{x}\circ I)s_{x,h}=s_{x,h}^{\mathsf{T}}\Sigma_{x}s_{x,h}={\|h\|}_{\theta_{1}}^{2}$. Using this, \textup{Tr}(\Gamma_{x}S_{x,h}P_{x}S_{x,h}P_{x})=\textup{Tr}(\Gamma_{x}^{1/2}S_{x,h}P_{x}\cdot S_{x,h}P_{x}\Gamma_{x}^{1/2})\leq\sqrt{\textup{Tr}(\Gamma_{x}^{\frac{1}{2}}S_{x,h}P_{x}^{2}S_{x,h}\Gamma_{x}^{\frac{1}{2}})\,\textup{Tr}(\Gamma_{x}^{\frac{1}{2}}P_{x}S_{x,h}^{2}P_{x}\Gamma_{x}^{\frac{1}{2}})}=\sqrt{\textup{Tr}(P_{x}S_{x,h}\Gamma_{x}S_{x,h}P_{x})}\sqrt{\textup{Tr}(S_{x,h}P_{x}\Gamma_{x}P_{x}S_{x,h})}={\|\Gamma_{x}\|}_{\infty}{\|h\|}_{\theta_{1}}^{2}\,,\textup{Tr}(\Gamma_{x}P_{x}S_{x,h}^{2}P_{x})=\textup{Tr}(S_{x,h}P_{x}\Gamma_{x}P_{x}S_{x,h})\leq{\|\Gamma_{x}\|}_{\infty}\textup{Tr}(S_{x,h}P_{x}S_{x,h})\underset{\text{(i)}}{=}{\|\Gamma_{x}\|}_{\infty}{\|h\|}_{\theta_{1}}^{2}\,. Putting these together and using Lemma \ref{['lem:HybridGammaNorm']}, $\textup{Tr}{\bigl(g^{-1}\mathrm{D}^{2}\theta_{1}(x)[h,h]\bigr)}\geq-22{\|\Gamma_{x}\|}_{\infty}{\|h\|}_{\theta_{1}}^{2}\geq-\frac{1}{2}\,{\|h\|}_{\theta_{1}}^{2}\,,$ and it follows from $g_{2}=44\sqrt{\frac{m}{d}}\left(\theta_{1}+\frac{d}{m}\theta_{2}\right)$ that $\textup{Tr}{\bigl(g^{-1}\mathrm{D}^{2}g_{2}(x)[h,h]\bigr)}\geq-\frac{1}{2}\,{\|h\|}_{g_{2}}^{2}$. For $\theta(x):=A_{x}^{\mathsf{T}}W_{x}A_{x}$ (i.e., the unscaled version of $g_{2}$), we write $g_{2}=c\cdot\theta$ for a constant $c$, which will be set to $c_{1}(\log m)^{c_{2}}\sqrt{d}$ for some constants $c_{1},c_{2}>0$ later. Going forward, $P_{x}$ indicates the projection matrix of $W_{x}^{1/2-1/p}A_{x}$ (i.e., $P_{x}=P(W_{x}^{1/2-1/p}A_{x})$). ${\|\Gamma_{x}\|}_{\infty}\leq2c^{-1}m^{\frac{2}{p+2}}$. Note that $0\preceq\Gamma_{x}=\textup{Diag}(A_{x}g^{-1}A_{x}^{\mathsf{T}})\preceq c^{-1}\textup{Diag}(A_{x}\theta^{-1}A_{x}^{\mathsf{T}})$. By Lemma \ref{['lem:usefulFactLewis']}-1, ${\|\textup{Diag}(A_{x}\theta^{-1}A_{x}^{\mathsf{T}})\|}_{\infty}=\max_{i\in[m]}\frac{{\bigl[\sigma{\bigl(W_{x}^{1/2}A_{x}\bigr)}\bigr]}_{i}}{{\bigl[W_{x}\bigr]}_{ii}}\leq2m^{\frac{2}{p+2}}\,.\qedhere$ Now we show SLTSC of the Lewis-weight metric: From \ref{['eq:LW-second-derv']}, $\mathrm{D}^{2}\theta[h,h]\succeq-4A_{x}^{\mathsf{T}}W_{x,h}'S_{x,h}A_{x}+A_{x}^{\mathsf{T}}W_{x,h}"A_{x}$. Thus, $\textup{Tr}(g^{-1}\mathrm{D}^{2}\theta[h,h])\geq\textup{Tr}{\bigl(\Gamma_{x}(W_{x,h}"-4W_{x,h}'S_{x,h})\bigr)}=-4\textup{Tr}(\Gamma_{x}W_{x,h}'S_{x,h})+\textup{Tr}(\Gamma_{x}W_{x,h}")\,.$ As for the first term, $\textup{Tr}(\Gamma_{x}W_{x,h}'S_{x,h})\leq p\,{\|\Gamma_{x}\|}_{\infty}{\|h\|}_{\theta}^{2}$ follows from \ref{['eq:trSW']} with $\Gamma_{x}$ replacing $s_{x,h}^{2}$. As for the second term $\textup{Tr}(\Gamma_{x}W_{x,h}")$ (i.e., \ref{['eq:trGamma']} with $\Gamma=\Gamma_{x}$), each term there is of the form $\textup{Tr}(\Gamma_{x}\textup{Diag}(v))$ for $v\in\mathbb{R}^{m}$, which can be bounded as follows: |\textup{Tr}{\bigl(\Gamma_{x}\textup{Diag}(v)\bigr)}|=|\textup{Tr}(\Gamma_{x}W_{x}^{\frac{1}{2}}W_{x}^{-\frac{1}{2}}\textup{Diag}(v))|\leq\sqrt{\textup{Tr}(W_{x}^{\frac{1}{2}}\Gamma_{x}^{2}W_{x}^{\frac{1}{2}})}\sqrt{\textup{Tr}{\bigl(\textup{Diag}(v)W_{x}^{-1}\textup{Diag}(v)\bigr)}}\leq{\|\Gamma_{x}\|}_{\infty}\sqrt{\textup{Tr}(W_{x})}{\|v\|}_{W_{x}^{-1}}=\sqrt{d}{\|\Gamma_{x}\|}_{\infty}{\|v\|}_{W_{x}^{-1}}\,. Then, we obtain $|\textup{Tr}(\Gamma_{x}W_{x,h}")|\lesssim\sqrt{d}{\|\Gamma_{x}\|}_{\infty}{\|h\|}_{\theta}^{2}$ for $p=\mathcal{O}(\log m)$ by using this inequality together with the norm bounds in Lemma \ref{['lem:second-deriv-Lewis']}. Putting things together, we conclude that \textup{Tr}(g^{-1}\mathrm{D}^{2}\theta[h,h])\gtrsim-p{\|\Gamma_{x}\|}_{\infty}{\|h\|}_{\theta}^{2}-\sqrt{d}{\|\Gamma_{x}\|}_{\infty}{\|h\|}_{\theta}^{2}\gtrsim-c^{-1}\sqrt{d}{\|h\|}_{\theta}^{2}\,, where the last line follows from Lemma \ref{['lem:GammaNormLSMetric']}. Therefore, there exists positive constants $d_{1}$ and $d_{2}$ such that $\textup{Tr}(g^{-1}\mathrm{D}^{2}\theta[h,h])\geq-c^{-1}d_{1}(\log m)^{d_{2}}\sqrt{d}{\|h\|}_{\theta}^{2}$, which implies $\textup{Tr}(g^{-1}\mathrm{D}^{2}g_{2}[h,h])\geq-c^{-1}d_{1}(\log m)^{d_{2}}\sqrt{d}{\|h\|}_{g_{2}}^{2}\,.$ By taking $c=d_{1}(\log m)^{d_{2}}\sqrt{d}$, the metric $g_{2}=c\theta=d_{1}(\log m)^{d_{2}}\sqrt{d}A_{x}^{\mathsf{T}}W_{x}A_{x}$ is SLTSC. We proceed with a general form of the metric $g(x)=A_{x}^{\mathsf{T}}D_{x}A_{x}$ with a diagonal matrix $0\prec D_{x}\in\mathbb{R}^{m}$. Then we provide computational lemmas used when proving SASC of barriers for the linear constraints. We pick any $g':\textup{int}(K)\to\mathbb{S}_{+}^{d}$ such that $\bar{g}:=g+g'\succ0$. By affine invariance, we may assume $\bar{g}(x)=I$ and $x=0$. Note that $g(x)\preceq I_{d}$, and $z$ equals $rh/\sqrt{d}$ for $h\sim\mathcal{N}(0,I_{d})$ in law. Applying Taylor's expansion to ${\|z-x\|}_{g(z)}^{2}$ at $z=x$ (as in the proof of Lemma \ref{['lem:hsc-to-sasc']}), for some $p_{z}\in[x,z]$ |{\|z-x\|}_{g(z)}^{2}-{\|z-x\|}_{g(x)}^{2}|\leq\frac{r^{2}}{d}\,{\Bigl(\frac{r}{\sqrt{d}}\underbrace{|\mathrm{D} g(x)[h^{\otimes3}]|}_{\eqqcolon\textsf{A}}+\frac{r^{2}}{2d}\underbrace{|\mathrm{D}^{2}g(p_{z})[h^{\otimes4}]|}_{\eqqcolon\textsf{B}}\Bigr)}\,. It suffices to show that $|\mathrm{D} g(x)[h^{\otimes3}]|=\mathcal{O}(d^{1/2})$ and $|\mathrm{D}^{2}g(p_{z})[h^{\otimes4}]|=\mathcal{O}(d)$ with high probability. By \ref{['eq:Dgh']}, we have $\mathrm{D} g(x)[h^{\otimes3}]=-2s_{x,h}^{\mathsf{T}}D_{x}S_{x,h}s_{x,h}+s_{x,h}^{\mathsf{T}}D_{x,h}'s_{x,h}$. Let $a_{i}$ denote the $i$-th row of $A_{x}$ for $i\in[m]$, and define two polynomials in $h$ as follows: $P_{1}(h):=s_{x,h}^{\mathsf{T}}D_{x}S_{x,h}s_{x,h}=\textup{Tr}(D_{x}S_{x,h}^{3})=\sum_{i=1}^{m}d_{i}\,(a_{i}^{\mathsf{T}}h)^{3}\,,\quad\text{and}\quad P_{2}(h):=s_{x,h}^{\mathsf{T}}D_{x,h}'s_{x,h}\,.$ By Lemma \ref{['lem:matrix-projection']}, $D_{x}^{1/2}A_{x}A_{x}^{\mathsf{T}}D_{x}^{1/2}\preceq P(D_{x}^{1/2}A_{x})$ and thus $\max_{i\in[m]}{\|a_{i}\|}^{2}={\|\textup{Diag}(A_{x}A_{x}^{\mathsf{T}})\|}_{\infty}\leq\max_{i}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}\,.$ By Lemma \ref{['lem:variance-1']}, \mathbb{E}[P_{1}(h)^{2}]=\mathbb{E}{\Bigl[\Bigl\{\sum_{i=1}^{m}d_{i}(a_{i}\cdot h)^{3}\Bigr\}^{2}\Bigr]}=9\sum_{i,j=1}^{m}{\|d_{i}^{1/3}a_{i}\|}^{2}{\|d_{j}^{1/3}a_{j}\|}^{2}\langle d_{i}^{1/3}a_{i},d_{j}^{1/3}a_{j}\rangle+6\sum_{i,j}\langle d_{i}^{1/3}a_{i},d_{j}^{1/3}a_{j}\rangle^{3}=9\cdot1^{\mathsf{T}}\textup{Diag}(A_{x}A_{x}^{\mathsf{T}})\,D_{x}^{1/2}\underbrace{D_{x}^{1/2}A_{x}A_{x}^{\mathsf{T}}D_{x}^{1/2}}_{\preceq P(D_{x}^{1/2}A_{x})\preceq I_{m}}D_{x}^{1/2}\,\textup{Diag}(A_{x}A_{x}^{\mathsf{T}})\,1+6\sum_{i,j}d_{i}d_{j}(a_{i}\cdot a_{j})^{3}\lesssim{\|\textup{Diag}(A_{x}A_{x}^{\mathsf{T}})\|}_{\infty}\,\textup{Tr}{\bigl(\textup{Diag}(A_{x}A_{x}^{\mathsf{T}})\,D_{x}\bigr)}+\max_{i}{\|a_{i}\|}^{2}\cdot\sum_{i,j}d_{i}d_{j}(a_{i}\cdot a_{j})^{2}=\max_{i}{\|a_{i}\|}^{2}\,\textup{Tr}(A_{x}^{\mathsf{T}}D_{x}A_{x})+\max_{i}{\|a_{i}\|}^{2}\cdot\sum_{j}\textup{Tr}(d_{j}a_{j}^{\mathsf{T}}A_{x}^{\mathsf{T}}D_{x}A_{x}a_{j})\underset{\text{(i)}}{\leq}2\max_{i}{\|a_{i}\|}^{2}\,\textup{Tr}(A_{x}^{\mathsf{T}}D_{x}A_{x})\leq2d\,\max_{i}{\|a_{i}\|}^{2}\,, where (i) follows from $A_{x}^{\mathsf{T}}D_{x}A_{x}\preceq I_{d}$ and $\sum_{j}\textup{Tr}(d_{j}a_{j}^{\mathsf{T}}A_{x}^{\mathsf{T}}D_{x}A_{x}a_{j})\leq\sum_{j}\textup{Tr}(d_{j}a_{j}^{\mathsf{T}}a_{j})=\textup{Tr}(A_{x}^{\mathsf{T}}D_{x}A_{x})$. Another polynomial $P_{2}(h)$ requires a different strategy for bounding $\mathbb{E}[P_{2}(h)^{2}]$ for each barrier. This polynomial vanishes for the log-barrier, while the Vaidya and Lewis-weight metrics requires rather involved tasks for bounding $\mathbb{E}[P_{2}(h)^{2}]$. Due to \ref{['eq:LW-fourth-moment']} (with $W_{x}$ replaced by $D_{x}$), $|\mathrm{D}^{2}g(p_{z})[h^{\otimes4}]|$ consists of three polynomials: $\bar{P}_{3}(h):=\textup{Tr}(D_{p_{z}}S_{p_{z},h}^{4})\,,\quad\bar{P}_{4}(h)=\textup{Tr}(D_{p_{z},h}'S_{p_{z},h}^{2})\,,\quad\bar{P}_{5}(h)=\textup{Tr}(D_{p_{z},h}"S_{p_{z},h}^{2})\,.$ For each $i=3,4,5$, we define $P_{i}(h)$ by $\bar{P}_{i}(h)$ with $p_{z}$ replaced by $x$. For the log-barrier, $\bar{P}_{3}(h)$ only matters since $D_{(\cdot)}=I_{m}$. For the Vaidya metric, $\bar{P}_{4}(h)$ and $\bar{P}_{5}(h)$ can be bounded by multiples of $\bar{P}_{3}(h)$. For the Lewis-weight metric, each $\bar{P}_{i}$ requires a different procedure for bounding $\mathbb{E}[\bar{P}_{i}(h)^{2}]$. Moreover, we can show $\bar{P}_{i}(h)\lesssim P_{i}(h)$ and \mathbb{E}[P_{3}(h)^{2}]=\sum_{i,j\in[m]}\mathbb{E}[d_{i}d_{j}\,(a_{i}\cdot h)^{4}(a_{j}\cdot h)^{4}]\underset{\textup{CS}}{\leq}\sum_{i,j}d_{i}d_{j}\sqrt{\mathbb{E}[(a_{i}\cdot h)^{8}]}\sqrt{\mathbb{E}[(a_{j}\cdot h)^{8}]}\underset{\text{(i)}}{\lesssim}{\Bigl(\sum_{i}d_{i}{\|a_{i}\|}^{4}\Bigr)}^{2}\leq\max_{i}{\|a_{i}\|}^{4}\,{\Bigl(\sum_{i}d_{i}{\|a_{i}\|}^{2}\Bigr)}^{2}\underset{\text{(ii)}}{\leq}d^{2}\max_{i}{\|a_{i}\|}^{4}\,, where we used $a_{i}\cdot h\sim\mathcal{N}(0,{\|a_{i}\|}^{2})$ in (i), and $\sum_{i}d_{i}{\|a_{i}\|}^{2}=\textup{Tr}(A_{x}^{\mathsf{T}}D_{x}A_{x})\leq\textup{Tr}(I_{d})$ in (ii). We now show SASC of the three barriers for linear constraints, using this proof outline. Set $g(x)=A_{x}^{\mathsf{T}}A_{x}$ (with $D_{x}=I_{m}$). By \ref{['eq:max-ai']}, $\max_{i\in[m]}{\|a_{i}\|}^{2}\leq\max[\sigma(A_{x}^{1/2})]_{i}\leq1\,.$ As for the term $\mathsf{A}$, it suffices to bound $P_{1}(h)=\textup{Tr}(S_{x,h}^{3})$. Since $\mathbb{E}[P_{1}(h)^{2}]\lesssim d$ by \ref{['eq:P1_bound']}, by Lemma \ref{['lem:conc-gaussian-poly']} with $t=(2e)^{3/2}\vee{\bigl(\frac{2e}{3}\log\frac{2}{\varepsilon}\bigr)}^{3/2}$ and $r_{1}(\varepsilon):=\varepsilon(2\sqrt{60}t)^{-1}$, we have that for any $r\leq r_{1}(\varepsilon)$, $\text{Event }B_{1}:\quad\mathbb{P}_{h}{\Bigl(\frac{r}{\sqrt{d}}\,|P_{1}(h)|\geq\varepsilon\Bigr)}\leq\varepsilon\,.$ As for the term $\mathsf{B}$, recall $\mathbb{P}_{z}{\bigl({\|z\|}\geq-r\cdot2\log\varepsilon\bigr)}\leq\varepsilon$ and call this event $B_{2}$. We take $r_{2}(\varepsilon)$ so that $1-2r_{2}\log\varepsilon\leq1.1$, which ensures ${\|z\|}\leq2r$ conditioned on $B_{2}^{c}$ for $r\leq r_{2}$. Next, we establish coordinate-wise closeness of $s_{x}$ at close-by points. Let $x_{t}=x+\frac{tr}{\sqrt{d}}h$, and $s_{t}=Ax_{t}-b$. For $t\in[0,1]$, \left\Vert S_{0}^{-1}\,\frac{\mathrm{d} s_{t}}{\mathrm{d} t}\right\Vert _{\infty}=\frac{r}{\sqrt{d}}\,{\|A_{x}h\|}_{\infty}\leq\frac{r}{\sqrt{d}}\,{\|h\|}_{g(x)}\leq\frac{r}{\sqrt{d}}\,{\|h\|}={\|z\|}\,, and conditioned on $z\in B_{2}^{c}$ we know ${\|z\|}\leq2r\log\frac{1}{\varepsilon}\leq0.1$ for $r\leq r_{2}$. Hence, $\max_{i\in[m]}|\frac{s_{p,i}-s_{x,i}}{s_{x,i}}|\leq\int_{0}^{1}\left\Vert S_{0}^{-1}\,\frac{\mathrm{d} s_{t}}{\mathrm{d} t}\right\Vert _{\infty}\,\mathrm{d} t\leq0.1\,,$ and thus $1.2\geq s_{x,i}/s_{p,i}\geq0.9$ for all $i\in[m]$ (i.e., $S_{p}^{-1}\preceq1.2S_{x}^{-1}$). Using this, we bound $\bar{P}_{3}(h)=\textup{Tr}(S_{p,h}^{4})$ by a multiple of $P_{3}(h)=\textup{Tr}(S_{x,h}^{4})$ as follows: \textup{Tr}(S_{p,h}^{4})=\textup{Tr}(h^{\mathsf{T}}A^{\mathsf{T}}S_{p,h}S_{p}^{-2}S_{p,h}Ah)\leq2\textup{Tr}(h^{\mathsf{T}}A^{\mathsf{T}}S_{p,h}S_{x}^{-2}S_{p,h}Ah)=2\textup{Tr}(S_{x,h}^{2}S_{p,h}^{2})\leq4\textup{Tr}(S_{x,h}^{4})\,. Hence, $\mathbb{E}[\bar{P}_{3}(h)^{2}]\lesssim\mathbb{E}[P_{3}(h)^{2}]\lesssim d^{2}$ by \ref{['eq:P3_bound']}. Using Lemma \ref{['lem:conc-gaussian-poly']} with $t=(2e)^{2}\vee{\bigl(\frac{2e}{4}\log\frac{2}{\varepsilon}\bigr)}^{3/2}$ and taking $r_{3}(\varepsilon):=(\varepsilon/c_{1}t)^{1/2}$, we obtain $\text{Event }B_{3}:\quad\mathbb{P}{\Bigl(\frac{r^{2}}{2d}\cdot16\bar{P}_{3}(h)\geq\varepsilon\Bigr)}\geq\varepsilon\,,$ Combining bounds on $\mathsf{A}$ and $\mathsf{B}$ conditioned on $\cap_{i}B_{i}^{c}$, we have with probability at least $1-3\varepsilon$ $|{\|z-x\|}_{g(z)}^{2}-{\|z-x\|}_{g(x)}^{2}|\leq2\varepsilon\frac{r^{2}}{d}\quad\text{for any }r\leq\min_{i}r_{i}(\varepsilon)\,.$ By replacing $3\varepsilon\gets\varepsilon$, the claim follows. Set $g(x)=A_{x}^{\mathsf{T}}D_{x}A_{x}$ with $D_{x}=\sqrt{\frac{m}{d}}(\Sigma_{x}+\frac{d}{m}I_{m})$. By \ref{['eq:max-ai']} and \ref{['eq:28-1']}, $\max_{i\in[m]}{\|a_{i}\|}^{2}\leq\max_{i}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}\leq1\,.$ As $\mathsf{A}$ consists of $P_{1}$ and $P_{2}$ (see \ref{['eq:P12']}), we show $\mathbb{E}[P_{i}(h)^{2}]\lesssim d$ for $i\in[2]$, which by Lemma \ref{['lem:conc-gaussian-poly']} implies $|\mathsf{A}|\leq\sqrt{d}$ w.h.p. As for $P_{1}(h)=\textup{Tr}(D_{x}S_{x,h}^{3})$, we have $\mathbb{E}[P_{1}(h)]^{2}\lesssim d$ from \ref{['eq:P1_bound']}. As for $P_{2}(h)=\textup{Tr}(D_{x,h}'S_{x,h}^{2})$, our approach is similar to chen2018fast. By Lemma \ref{['lem:calculusLeverage']}, |P_{2}(h)|=|\sqrt{\frac{m}{d}}\textup{Tr}{\Bigl(\textup{Diag}{\bigl((\Sigma_{x}-P_{x}^{(2)})\,s_{x,h}\bigr)}\,S_{x,h}^{2}\Bigr)}|\leq|P_{1}(h)|+|\textup{Tr}(S_{x,h}^{3})|+\sqrt{\frac{m}{d}}\,|\textup{Tr}{\bigl(\textup{Diag}(P_{x}^{(2)}s_{x,h})\,S_{x,h}^{2}\bigr)}|\,. Since we already established a high-probability bound for both $|P_{1}(h)|$ and $|\textup{Tr}(S_{x,h}^{3})|$ (which is $P_{1}(h)$ for the log-barrier), we focus on the third term in the RHS. For $\sigma_{x}:=\textsf{diag}\left(P_{x}\right)$ and $\sigma_{x,i,j}:=(P_{x})_{ij}$, it follows from $P_{x}^{2}=P_{x}$ that $\sigma_{x,i}=\sum_{j}\sigma_{x,i,j}^{2}$. Hence, \textup{Tr}(\Sigma_{x}S_{x,h}^{3})=1^{\mathsf{T}}\Sigma_{x}s_{x,h}^{3}=\sum_{i}(s_{x,h})_{i}^{3}\sigma_{x,i}=\sum_{i,j=1}^{m}\sigma_{x,i,j}^{2}(s_{x,h})_{i}^{3}\,,\textup{Tr}{\bigl(\textup{Diag}(P_{x}^{(2)}s_{x,h})\,S_{x,h}^{2}\bigr)}=\sum_{i,j=1}^{m}\sigma_{x,i,j}^{2}(s_{x,h})_{i}^{2}(s_{x,h})_{j}\underset{\text{symmetry}}{=}\sum_{i,j=1}^{m}\sigma_{x,i,j}^{2}(s_{x,h})_{j}^{2}(s_{x,h})_{i}\,. Combining these leads to 2\,\textup{Tr}(\Sigma_{x}S_{x,h}^{3})+6\,\textup{Tr}{\bigl(\textup{Diag}(P_{x}^{(2)}s_{x,h})S_{x,h}^{2}\bigr)}=\sum_{i,j=1}^{m}\sigma_{x,i,j}^{2}{\bigl((s_{x,h})_{i}^{3}+3(s_{x,h})_{i}^{2}(s_{x,h})_{j}+3(s_{x,h})_{i}(s_{x,h})_{j}^{2}+(s_{x,h})_{j}^{3}\bigr)}=\sum_{i,j=1}^{m}\sigma_{x,i,j}^{2}{\bigl((s_{x,h})_{i}+(s_{x,h})_{j}\bigr)}^{3}\,, so we handle $\sum_{i,j}\sigma_{x,i,j}^{2}{\bigl((s_{x,h})_{i}+(s_{x,h})_{j}\bigr)}^{3}$ instead of $\textup{Tr}{\bigl(\textup{Diag}(P_{x}^{(2)}s_{x,h})S_{x,h}^{2}\bigr)}$, as we already bounded $\sqrt{\frac{m}{d}}\textup{Tr}(\Sigma_{x}S_{x,h}^{3})=P_{1}(h)-\sqrt{\frac{d}{m}}\textup{Tr}(S_{x,h}^{3})$. Due to $(s_{x,h})_{i}+(s_{x,h})_{j}=(a_{i}+a_{j})^{\mathsf{T}}h$, for $c_{ij}:=a_{i}+a_{j}$ \mathbb{E}{\Bigl[\Bigl\{\sum_{i,j\in[m]}\sigma_{x,i,j}^{2}{\bigl((s_{x,h})_{i}+(s_{x,h})_{j}\bigr)}^{3}\Bigr\}^{2}\Bigr]}=\sum_{i,j,k,l}\sigma_{x,i,j}^{2}\sigma_{x,k,l}^{2}\mathbb{E}[(c_{ij}\cdot h)^{3}(c_{kl}\cdot h)^{3}]\underset{\text{Lemma }\ref{['lem:variance-1']}}{=}9\sum_{i,j,k,l}\sigma_{x,i,j}^{2}\sigma_{x,k,l}^{2}{\|c_{ij}\|}^{2}{\|c_{kl}\|}^{2}(c_{ij}\cdot c_{kl})+6\sum_{i,j,k,l}\sigma_{x,i,j}^{2}\sigma_{x,k,l}^{2}(c_{ij}\cdot c_{kl})^{3}\,. As for the first term in \ref{['eq:vaidya-cubic-expansion']}, we denote $z_{i}:=\sum_{j}\sigma_{x,i,j}^{2}\|c_{ij}\|^{2}$ and $Z:=\textup{Diag}{\bigl((z_{i})_{i\in[m]}\bigr)}$. Then, \sum_{i,j,k,l}\sigma_{x,i,j}^{2}\sigma_{x,k,l}^{2}{\|c_{ij}\|}^{2}{\|c_{kl}\|}^{2}(c_{ij}\cdot c_{kl})={\Bigl\Vert\sum_{ij}\sigma_{x,i,j}^{2}{\|c_{ij}\|}^{2}c_{ij}\Bigr\Vert}^{2}\leq2{\Bigl\Vert\sum_{ij}\sigma_{x,i,j}^{2}{\|c_{ij}\|}^{2}a_{i}\Bigr\Vert}^{2}+2{\Bigl\Vert\sum_{ij}\sigma_{x,i,j}^{2}{\|c_{ij}\|}^{2}a_{j}\Bigr\Vert}^{2}=4{\Bigl\Vert\sum_{ij}\sigma_{x,i,j}^{2}{\|c_{ij}\|}^{2}a_{i}\Bigr\Vert}^{2}={\Bigl\Vert\sum_{i}z_{i}a_{i}\Bigr\Vert}^{2}=1^{\mathsf{T}}ZA_{x}A_{x}^{\mathsf{T}}Z\,1\le1^{\mathsf{T}}ZD_{x}^{-1/2}P(D_{x}^{1/2}A_{x})\,D_{x}^{-1/2}Z\,1\le1^{\mathsf{T}}ZD_{x}^{-1}Z\,1\lesssim\sqrt{\frac{d}{m}}\,\textup{Tr}(Z)\,, where the last inequality follows from $Z\precsim\Sigma_{x}\preceq\sqrt{\frac{d}{m}}D_{x}$ due to z_{i}\leq2\sum_{j}\sigma_{x,i,j}^{2}(\|a_{i}\|^{2}+\|a_{j}\|^{2})\lesssim\underbrace{\sigma_{x,i}\|a_{i}\|^{2}+\sum_{j}\sigma_{x,i,j}^{2}\|a_{j}\|^{2}}_{\eqqcolon\mathsf{K}_{i}}\leq\sigma_{x,i}\|a_{i}\|^{2}+\sigma_{x,i}\lesssim\sigma_{x,i}\,. Moreover, using the bound in $\mathsf{K}_{i}$ and $\sum_{i,j}\sigma_{x,i,j}^{2}{\|a_{j}\|}^{2}=\sum_{j}\sigma_{x,i}{\|a_{i}\|}^{2}$ $\textup{Tr}(Z)\lesssim\sum_{i}(\sigma_{x,i}\|a_{i}\|^{2}+\sum_{j}\sigma_{x,i,j}^{2}\|a_{j}\|^{2})=2\textup{Tr}(A_{x}^{\mathsf{T}}\Sigma_{x}A_{x})\lesssim\sqrt{\frac{d}{m}}\,\textup{Tr}(A_{x}^{\mathsf{T}}D_{x}A_{x})\leq d\sqrt{\frac{d}{m}}\,.$ Putting this into \ref{['eq:trZ-bound']}, we obtain $\sum_{i,j,k,l}\sigma_{x,i,j}^{2}\sigma_{x,k,l}^{2}{\|c_{ij}\|}^{2}{\|c_{kl}\|}^{2}(c_{ij}\cdot c_{kl})\lesssim d^{2}/m$. As for the second term in \ref{['eq:vaidya-cubic-expansion']}, \sum_{i,j,k,l}\sigma_{x,i,j}^{2}\sigma_{x,k,l}^{2}\,(c_{ij}\cdot c_{kl})^{3}\lesssim\sum_{i,j,k,l}\sigma_{x,i,j}^{2}\sigma_{x,k,l}^{2}\,|c_{ij}\cdot c_{kl}|^{2}\leq\sum_{i,j,k,l}\sigma_{x,i,j}^{2}\sigma_{x,k,l}^{2}\,(a_{i}\cdot a_{k}+a_{i}\cdot a_{l}+a_{j}\cdot a_{k}+a_{j}\cdot a_{l})^{2}\lesssim\sum_{i,j,k,l}\sigma_{x,i,j}^{2}\sigma_{x,k,l}^{2}\,(a_{i}\cdot a_{k})^{2}=\sum_{ik}\sigma_{i}\sigma_{k}\,(a_{i}\cdot a_{k})^{2}=\sum_{k}\textup{Tr}(\sigma_{k}a_{k}^{\mathsf{T}}A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}a_{k})\leq\sqrt{\frac{d}{m}}\sum_{k}\textup{Tr}(\sigma_{k}a_{k}^{\mathsf{T}}a_{k})=\sqrt{\frac{d}{m}}\,\textup{Tr}(A_{x}^{\mathsf{T}}\Sigma_{x}A_{x})\le\frac{d^{2}}{m}\,. This establish a high-probability bound of $\mathcal{O}(d^{2}/m)$ on \ref{['eq:vaidya-cubic-expansion']}, implying an $\mathcal{O}(\sqrt{d})$-high-probability bound on $\sqrt{\frac{m}{d}}|\textup{Tr}{\bigl(\textup{Diag}(P_{x}^{(2)}s_{x,h})\,S_{x,h}^{2}\bigr)}|$. We show that $s_{x}$ and $s_{p_{z}}$ are close, and the same holds for $\sigma_{x}$ and $\sigma_{p_{z}}$. For $s_{x}$, following the argument for the log-barrier, we let $x_{t}:=x+th\frac{r}{\sqrt{d}}$ and $s_{t}:=Ax_{t}-b$. For $0\leq t\leq1$, {\Bigl\Vert S_{0}^{-1}\frac{\mathrm{d}s_{t}}{\mathrm{d}t}\Bigr\Vert}_{\infty}=\frac{r}{\sqrt{d}}\,{\|A_{x}h\|}_{\infty}\underset{\ref{['eq:28-1']}}{\leq}\frac{r}{\sqrt{d}}\,{\|h\|}_{A_{x}^{\mathsf{T}}D_{x}A_{x}}\leq\frac{r}{\sqrt{d}}\,{\|h\|}={\|z\|}\,. Conditioned on the high-probability bound of ${\|z\|}\leq2r\log\frac{1}{\varepsilon}\leq0.1$ for any $r$ less than some $r(\varepsilon)$, $\max_{i\in[m]}|\frac{s_{p,i}-s_{x,i}}{s_{x,i}}|\leq\int_{0}^{1}{\Bigl\Vert S_{0}^{-1}\frac{\mathrm{d}s_{t}}{\mathrm{d}t}\Bigr\Vert}_{\infty}\,\mathrm{d} t\leq0.1\,,$ and thus $1.2\geq s_{x,i}/s_{p,i}\geq0.9$ for all $i\in[m]$ (i.e., $S_{p}^{-1}\preceq1.2S_{x}^{-1}$). For $\sigma_{x}$, as we have $\Sigma_{x}=\textup{Diag}(A_{x}(A_{x}^{\mathsf{T}}A_{x})^{-1}A_{x}^{\mathsf{T}})$, we have the same closeness between $\sigma_{x,i}$ and $\sigma_{p,i}$ for each $i\in[m]$. Using the formulas in Lemma \ref{['lem:calculusLeverage']}, |\mathrm{D}^{2}g(p)[h^{\otimes4}]|\lesssim\sqrt{\frac{m}{d}}\,\Bigl(\textup{Tr}{\bigl((\Sigma_{p}+\frac{d}{m}I_{m})S_{p,h}^{4}\bigr)}+\underbrace{\textup{Tr}(S_{p,h}^{2}P_{p}S_{p,h}P_{p}S_{p,h})}_{(*)}\qquad\qquad\qquad+\textup{Tr}(S_{p,h}^{2}P_{p}S_{p,h}^{2}P_{p})+\underbrace{\textup{Tr}(S_{p,h}P_{p}S_{p,h}P_{p}S_{p,h}P_{p}S_{p,h})}_{\leq\textup{Tr}(S_{p,h}^{2}P_{p}S_{p,h}^{2}P_{p})}\Bigr)\underset{\text{(i)}}{\lesssim}\sqrt{\frac{m}{d}}\,{\Bigl(\textup{Tr}{\Bigl((\Sigma_{p}+\frac{d}{m}I_{m})S_{p,h}^{4}\Bigr)}+\textup{Tr}(S_{p,h}^{2}\Sigma_{p}S_{p,h}^{2})+\underbrace{\textup{Tr}(S_{p,h}^{2}P_{p}S_{p,h}^{2}P_{p})}_{\text{Use Lemma }\ref{['lem:Kronecker']}}\Bigr)}\underset{\text{(ii)}}{\lesssim}\sqrt{\frac{m}{d}}\,\textup{Tr}{\Bigl((\Sigma_{p}+\frac{d}{m}I_{m})S_{p,h}^{4}\Bigr)}\underset{\text{(iii)}}{\lesssim}\sqrt{\frac{m}{d}}\,\textup{Tr}{\Bigl((\Sigma_{x}+\frac{d}{m}I_{m})S_{x,h}^{4}\Bigr)}=P_{3}(h)\,, where in (i) we used the Cauchy-Schwarz inequality on $(*)$: \textup{Tr}(S_{p,h}^{2}P_{p}S_{p,h}P_{p}S_{p,h})\leq\sqrt{\textup{Tr}(S_{p,h}^{2}P_{p}^{2}S_{p,h}^{2})}\sqrt{\textup{Tr}(S_{p,h}P_{p}S_{p,h}^{2}P_{p}S_{p,h})}\underset{\text{AM-GM}}{\leq}\frac{1}{2}{\bigl(\textup{Tr}(S_{p,h}^{2}P_{p}^{2}S_{p,h}^{2})+\textup{Tr}(S_{p,h}P_{p}S_{p,h}^{2}P_{p}S_{p,h})\bigr)}\leq\frac{1}{2}\,{\bigl(\textup{Tr}(S_{p,h}^{2}\Sigma_{p}S_{p,h}^{2})+\textup{Tr}(S_{p,h}^{2}P_{p}S_{p,h}^{2}P_{p})\bigr)}\,, (ii) follows from $\textup{Tr}(S_{p,h}^{2}P_{p}S_{p,h}^{2}P_{p})=s_{p,h}^{2}\cdot P_{p}^{(2)}s_{p,h}^{2}\preceq s_{p,h}^{2}\cdot\Sigma_{p}s_{p,h}^{2}\preceq s_{p,h}^{2}\cdot(\Sigma_{p}+\frac{d}{m}I_{m})s_{p,h}^{2}$, and in (iii) we used coordinate-wise closeness of $s_{x}\leftrightarrow s_{p}$ and $\sigma_{x}\leftrightarrow\sigma_{p}$. By \ref{['eq:P3_bound']}, $\mathbb{E}[P_{3}(h)^{2}]\lesssim d^{2}$, and an $\mathcal{O}(d)$-high-probability bound on $|P_{3}(h)|$ (so on $\mathsf{B}$) follows from Lemma \ref{['lem:conc-gaussian-poly']}. Set $g(x)=\sqrt{d}A_{x}^{\mathsf{T}}W_{x}A_{x}$ (with $D_{x}=\sqrt{d}\,W_{x}$). By \ref{['eq:max-ai']} and Lemma \ref{['lem:usefulFactLewis']}-1, $\max_{i\in[m]}{\|a_{i}\|}^{2}\leq\max_{i}\frac{[\sigma(D_{x}^{1/2}A_{x})]_{i}}{[D_{x}]_{ii}}\leq\frac{2m^{\frac{2}{p+2}}}{\sqrt{d}}\lesssim\frac{1}{\sqrt{d}}\,.$ As done for the Vaidya metric, a high-probability bound on $\mathsf{A}$ requires $\mathbb{E}[P_{i}(h)^{2}]\lesssim d$ for $i=1,2$ (see \ref{['eq:P12']}). Note that $\mathbb{E}[P_{1}(h)^{2}]\lesssim\sqrt{d}$ by \ref{['eq:P1_bound']}. As for $P_{2}(h)=\sqrt{d}\,s_{x,h}^{\mathsf{T}}W_{x,h}'s_{x,h}$, we show $\mathbb{E}[P_{2}(h)^{2}]\lesssim\sqrt{d}$. Due to $W_{x,h}'=-\textup{Diag}(W_{x}^{\frac{1}{2}}N_{x}W_{x}^{\frac{1}{2}}s_{x,h})$ (Lemma \ref{['lem:DWh']}), $P_{2}(h)=-\sqrt{d}s_{x,h}^{\mathsf{T}}\textup{Diag}(W_{x}^{\frac{1}{2}}N_{x}W_{x}^{\frac{1}{2}}s_{x,h})s_{x,h}=-\sqrt{d}\textup{Tr}{\bigl(\textup{Diag}(W_{x}^{\frac{1}{2}}N_{x}W_{x}^{\frac{1}{2}}s_{x,h})S_{x,h}^{2}\bigr)}$. Thus, P_{2}(h)=\sqrt{d}\,\textup{Tr}{\bigl(\textup{Diag}(N_{x}W_{x}^{\frac{1}{2}}s_{x,h})W_{x}^{\frac{1}{2}}S_{x,h}^{2}\bigr)}=\sqrt{d}\sum_{i=1}^{m}w_{i}^{1/2}(a_{i}\cdot h)^{2}(b_{i}\cdot h)\,, where $b_{i}$ is the $i$-th row of $B:=N_{x}W_{x}^{\frac{1}{2}}A_{x}$ for $i=1,\dots,m$. By Lemma \ref{['lem:variance-2']}, \mathbb{E}{\Bigl[\Bigl\{\sum_{i=1}^{m}w_{i}^{1/2}(a_{i}\cdot h)^{2}(b_{i}\cdot h)\Bigr\}^{2}\Bigr]}=\sum_{i,j\in[m]}w_{i}^{1/2}w_{j}^{1/2}\|a_{i}\|^{2}\|a_{j}\|^{2}(b_{i}\cdot b_{j})\quad+4\sum_{i,j}w_{i}^{1/2}w_{j}^{1/2}(a_{i}\cdot a_{j})(a_{i}\cdot b_{i})(a_{j}\cdot b_{j})+4\sum_{i,j}w_{i}^{1/2}w_{j}^{1/2}\|a_{i}\|^{2}(b_{i}\cdot a_{j})(a_{j}\cdot b_{j})\quad+2\underbrace{\sum_{i,j}w_{i}^{1/2}w_{j}^{1/2}(a_{i}\cdot a_{j})^{2}(b_{i}\cdot b_{j})}_{=:T_{1}}+4\underbrace{\sum_{i,j}w_{i}^{1/2}w_{j}^{1/2}(a_{i}\cdot a_{j})(a_{i}\cdot b_{j})(a_{j}\cdot b_{i})}_{=:T_{2}}=\underbrace{1^{\mathsf{T}}\textup{Diag}(A_{x}A_{x}^{\mathsf{T}})\,W^{\frac{1}{2}}BB^{\mathsf{T}}W^{\frac{1}{2}}\,\textup{Diag}(A_{x}A_{x}^{\mathsf{T}})\,1}_{\eqqcolon N_{1}}+4\cdot\underbrace{1^{\mathsf{T}}\textup{Diag}(A_{x}B^{\mathsf{T}})\,W^{\frac{1}{2}}A_{x}A_{x}^{\mathsf{T}}W^{\frac{1}{2}}\,\textup{Diag}(A_{x}B^{\mathsf{T}})\,1}_{\eqqcolon N_{2}}\quad+4\cdot\underbrace{[1^{\mathsf{T}}\textup{Diag}(A_{x}A_{x}^{\mathsf{T}})\,W^{\frac{1}{2}}B]\cdot[A_{x}^{\mathsf{T}}W^{\frac{1}{2}}\,\textup{Diag}(A_{x}B^{\mathsf{T}})\,1]}_{\le N_{1}+N_{2}\text{ by Young's inequality}}+2T_{1}+4T_{2}\,. As for $N_{1}$, since $B^{\mathsf{T}}B=A_{x}^{\mathsf{T}}W_{x}^{\frac{1}{2}}N_{x}^{2}W_{x}^{\frac{1}{2}}A_{x}\leq p^{2}A_{x}^{\mathsf{T}}W_{x}A_{x}$ by Lemma \ref{['lem:LS-comp-tool']}-1 and thus $B^{\mathsf{T}}B\precsim(d)^{-1/2}I_{d}$, Lemma \ref{['lem:matrix-projection']} ensures $BB^{\mathsf{T}}\precsim\frac{1}{\sqrt{d}}P(B)\preceq\frac{1}{\sqrt{d}}\,I_{m}$. Hence, N_{1}\lesssim\frac{1}{\sqrt{d}}\,\textup{Tr}{\bigl(\textup{Diag}(A_{x}A_{x}^{\mathsf{T}})\,W\,\textup{Diag}(A_{x}A_{x}^{\mathsf{T}})\bigr)}\leq\frac{1}{\sqrt{d}}\,\textup{Tr}(A_{x}^{\mathsf{T}}WA_{x})\,{\|\textup{Diag}(A_{x}A_{x}^{\mathsf{T}})\|}_{\infty}\lesssim\frac{1}{\sqrt{d}}\,. As for $N_{2}$, due to $A_{x}^{\mathsf{T}}W_{x}A_{x}\preceq\frac{1}{\sqrt{d}}I_{d}$ we have $W^{\frac{1}{2}}A_{x}A_{x}^{\mathsf{T}}W^{\frac{1}{2}}\preceq\frac{1}{\sqrt{d}}I_{m}$ by Lemma \ref{['lem:matrix-projection']}. Thus, N_{2}\lesssim\frac{1}{\sqrt{d}}\,\textup{Tr}{\bigl(\{\textup{Diag}(A_{x}B^{\mathsf{T}})\}^{2}\bigr)}=\frac{1}{\sqrt{d}}\sum_{i\in[m]}(a_{i}\cdot b_{i})^{2}\leq\frac{1}{\sqrt{d}}\sum_{i}\|a_{i}\|^{2}\|b_{i}\|^{2}\leq\frac{1}{d}\textup{Tr}(BB^{\mathsf{T}})\lesssim\frac{1}{d^{3/2}}\textup{Tr}{\bigl(P(B)\bigr)}\le\frac{1}{\sqrt{d}}\,. As for $T_{1}$, by Young's inequality (i.e., $2(a\cdot b)\leq{\|a\|}^{2}+{\|b\|}^{2}$) T_{1}=\sum_{i,j\in[m]}(a_{i}\cdot a_{j})^{2}\,{\bigl((w_{j}^{1/2}b_{i})\cdot(w_{i}^{1/2}b_{j})\bigr)}\lesssim\sum_{i,j}(a_{i}\cdot a_{j})^{2}\,(w_{j}{\|b_{i}\|}^{2}+w_{i}{\|b_{j}\|}^{2})=2\sum_{i,j}w_{j}(a_{i}\cdot a_{j})^{2}{\|b_{i}\|}^{2}=\sum_{i}{\|b_{i}\|}^{2}\cdot\textup{Tr}{\Bigl(a_{i}^{\mathsf{T}}{\Bigl(\sum_{j}a_{j}w_{j}a_{j}^{\mathsf{T}}\Bigr)}a_{i}\Bigr)}=\sum_{i}{\|b_{i}\|}^{2}\textup{Tr}(a_{i}^{\mathsf{T}}A_{x}^{\mathsf{T}}WA_{x}a_{i})\leq\frac{1}{\sqrt{d}}\sum_{i}{\|b_{i}\|}^{2}{\|a_{i}\|}^{2}\leq\frac{1}{d}\textup{Tr}(BB^{\mathsf{T}})\leq\frac{1}{\sqrt{d}}\,. As for $T_{2}$, using $(a_{i}\cdot a_{j})\leq{\|a_{i}\|}{\|a_{j}\|}\lesssim\frac{1}{\sqrt{d}}$ T_{2}=\sum_{i,j\in[m]}w_{i}^{1/2}w_{j}^{1/2}(a_{i}\cdot a_{j})(a_{i}\cdot b_{j})(a_{j}\cdot b_{i})\lesssim\frac{1}{\sqrt{d}}\sum_{i,j\in[m]}w_{i}^{1/2}w_{j}^{1/2}(a_{i}\cdot b_{j})(a_{j}\cdot b_{i})=\frac{1}{\sqrt{d}}\sum_{i}w_{i}^{1/2}b_{i}^{\mathsf{T}}\sum_{j}a_{j}w_{j}^{1/2}b_{j}^{\textbackslash T}a_{i}=\frac{1}{\sqrt{d}}\sum_{i}\textup{Tr}(a_{i}w_{i}^{1/2}b_{i}^{\mathsf{T}}A_{x}^{\mathsf{T}}W^{1/2}B)=\frac{1}{\sqrt{d}}\textup{Tr}{\bigl((A_{x}^{\mathsf{T}}W^{1/2}B)^{2}\bigr)}\underset{\text{CS}}{\leq}\frac{1}{\sqrt{d}}\textup{Tr}(B^{\mathsf{T}}W^{1/2}A_{x}A_{x}^{\mathsf{T}}W^{1/2}B)\leq\frac{1}{d}\textup{Tr}(B^{\mathsf{T}}B)\leq\frac{1}{\sqrt{d}}\,. Putting all the bounds together, we have $\mathbb{E}[P_{2}(h)^{2}]\lesssim d\cdot\frac{1}{\sqrt{d}}=\sqrt{d}$. We show that for any given $\alpha=\Theta(1)$, each coordinate of $w_{x}/s_{x}^{\alpha}$ and $w_{p_{z}}/s_{p_{z}}^{\alpha}$ is close. For $0\leq t\le1$, we define $x_{t}:=x+\frac{r}{\sqrt{d}}th$, and $s_{t},$ $w_{t}$ in the same fashion. Then for $p=\mathcal{O}(\log m)$, \max_{i\in[m]}|\log\frac{(w_{p_{z},i})^{\alpha}}{s_{p_{z},i}}-\log\frac{(w_{x,i})^{\alpha}}{s_{x,i}}|\leq\int_{0}^{1}|\frac{\mathrm{d}}{\mathrm{d} t}\log\frac{[w_{t,i}]^{\alpha}}{s_{t,i}}|\,\mathrm{d} t\lesssim\frac{r}{\sqrt{d}}\,{\|h\|}_{A_{x}^{\mathsf{T}}W_{x}A_{x}}\leq\frac{1}{d^{1/4}}{\|z\|}\,. Just as in showing SASC of the Vaidya metric, we can make this bound arbitrarily small (say $\delta\approx0$) by conditioning on the high-probability region where ${\|z\|}\leq r\log\frac{1}{\varepsilon}\leq0.01$. Hence, $e^{-\delta}\frac{(w_{x,i})^{\alpha}}{s_{x,i}}\leq\frac{(w_{p_{z},i})^{\alpha}}{s_{p_{z},i}}\leq e^{\delta}\frac{(w_{x,i})^{\alpha}}{s_{x,i}}\,.$ We remark that this $\Theta(1)$-multiplicative closeness is still valid without the $\sqrt{d}$-scaling of $A_{x}^{\mathsf{T}}W_{x}A_{x}$. Using the formula for $\mathrm{D}^{2}(A_{x}^{\mathsf{T}}W_{x}A_{x})[h^{\otimes4}]$ in \ref{['eq:LW-fourth-moment']}, |\mathrm{D}^{2}g(p)[h^{\otimes4}]|\lesssim{\bigl(\bar{P}_{3}(h)+|\bar{P}_{4}(h)|+|\bar{P}_{5}(h)|\bigr)}=\bar{P}_{3}(h)+\sqrt{d}\,{\bigl(|\textup{Tr}(W_{p,h}'S_{p,h}^{3})|+|\textup{Tr}(W_{p,h}"S_{p,h}^{2})|\bigr)}=\bar{P}_{3}(h)+\sqrt{d}\underbrace{|\textup{Tr}{\bigl(S_{p,h}^{3}\textup{Diag}(W_{p}^{\frac{1}{2}}N_{p}W_{p}^{\frac{1}{2}}s_{p,h})\bigr)}|}_{\eqqcolon T_{1}}+\sqrt{d}\underbrace{|\textup{Tr}(S_{p,h}^{2}W_{p,h}")|}_{\eqqcolon T_{2}}\,, where in the last line we used the formula for $W_{p,h}'$ (Lemma \ref{['lem:DWh']}). Now we show $\mathbb{E}[\bar{P}_{3}(h)^{2}]\lesssim d^{2}$ and $T_{i}\lesssim\sqrt{d}$ w.h.p. for $i=4,5$. As for $\bar{P}_{3}$, we have $\bar{P}_{3}(h)\lesssim P_{3}(h)$ from the closeness \ref{['eq:closeness']} of $w_{i}/s_{i}^{4}$ for each $i\in[m]$, so $\mathbb{E}[P_{3}(h)^{2}]\lesssim d^{2}\cdot d^{-1}=d$ from \ref{['eq:P3_bound']}. As for $T_{1}$, using the Cauchy-Schwarz T_{1}=|\textup{Tr}{\bigl(S_{p,h}^{3}W_{p}^{\frac{1}{2}}\,\textup{Diag}(N_{p}W_{p}^{\frac{1}{2}}s_{p,h})\bigr)}|\leq\sqrt{\textup{Tr}(S_{p,h}^{3}W_{p}S_{p,h}^{3})}\sqrt{s_{p,h}^{\mathsf{T}}W_{p}^{1/2}N_{p}^{2}W_{p}^{1/2}s_{p,h}}\underset{\text{(i)}}{\lesssim}\sqrt{s_{p,h}^{3}W_{p}s_{p,h}^{3}}\sqrt{s_{p,h}^{\mathsf{T}}W_{p}s_{p,h}}\underset{\text{(ii)}}{\lesssim}\sqrt{s_{x,h}^{3}W_{x}s_{x,h}^{3}}\sqrt{s_{x,h}^{\mathsf{T}}W_{x}s_{x,h}}=\sqrt{s_{x,h}^{3}W_{x}s_{x,h}^{3}}\cdot d^{-1/4}{\|h\|}_{g(x)}\,, where in (i) we used $N_{x}\preceq p^{2}I$ (Lemma \ref{['lem:LS-comp-tool']}), and in (ii) the closeness of $w_{i}/s_{i}^{6}$ and $w_{i}/s_{i}^{2}$ established in \ref{['eq:closeness']}. As for the first term in the RHS, \mathbb{E}[(s_{x,h}^{3}W_{x}s_{x,h}^{3})^{2}]\underset{\text{CS}}{\lesssim}\sum_{i,j\in[m]}w_{i}w_{j}\sqrt{\mathbb{E}[(a_{i}\cdot h)^{12}]}\sqrt{\mathbb{E}[(a_{j}\cdot h)^{12}]}={\Bigl(\sum_{i}w_{i}\,{\bigl(\mathbb{E}[(a_{i}\cdot h)^{12}]\bigr)}^{2}\Bigr)}^{2}\lesssim{\Bigl(\sum_{i}w_{i}{\|a_{i}\|}^{6}\Bigr)}^{2}\leq{\Bigl(\frac{1}{d^{3/2}}\sum_{i}w_{i}\Bigr)}^{2}=\frac{1}{d}\,. As for the second term, the concentration of the standard Gaussian guarantees ${\|h\|}_{g(x)}\leq{\|h\|}\lesssim\sqrt{d}$ w.h.p. Therefore, $T_{1}\lesssim\sqrt{d}$ w.h.p. As for $T_{2}$, \ref{['eq:trGamma']} with $\Gamma_{p}=S_{p,h}^{2}$ equals $T_{2}$. Following \ref{['eq:last-bound']} with I, II, III, IV defined in \ref{['eq:LW-second-derv']}, T_{2}\lesssim\sum_{v=\text{I,II,III,IV}}\sqrt{\textup{Tr}(W_{p}S_{p,h}^{4})}{\|v\|}_{W_{p}^{-1}}\underset{\text{(i)}}{\lesssim}\sqrt{\textup{Tr}(W_{p}S_{p,h}^{4})}\,{\bigl(\textup{Tr}(S_{p,h}^{2}W_{p})+\textup{Tr}(S_{p,h}^{4}W_{p})\bigr)}\underset{\text{(ii)}}{\lesssim}\sqrt{\textup{Tr}(W_{x}S_{x,h}^{4})}\,{\bigl(\textup{Tr}(S_{x,h}^{2}W_{x})+\textup{Tr}(S_{x,h}^{4}W_{x})\bigr)}\,, where (i) follows from Lemma \ref{['lem:second-deriv-Lewis']} (i.e., ${\|v\|}_{W_{p}^{-1}}\lesssim{\|h\|}_{A_{p}^{\mathsf{T}}W_{p}A_{p}}^{2}=\textup{Tr}(S_{p,h}^{2}W_{p})$ for $v=$ I, II, III, and ${\|\text{IV}\|}_{W_{p}^{-1}}\lesssim\textup{Tr}(S_{p,h}^{4}W_{p})$), and (ii) follows from the conditioned event where the closeness of $w_{i}/s_{i}^{2}$ at $x$ and $z$ holds. Since we already established the high-probability bounds of $d^{-1/2}P_{3}(h)=\textup{Tr}(S_{x,h}^{4}W_{x})\lesssim1$ and $\textup{Tr}(S_{x,h}^{2}W_{x})\lesssim\sqrt{d}$, combining these yield $T_{2}\lesssim\sqrt{d}$ w.h.p. We show that a $\nu$-SC barrier $\psi(\cdot)=-\log f(\cdot)$ satisfies $|\mathrm{D}^{4}\psi(x)[h^{\otimes4}]|\lesssim\nu^{2}{\|h\|}_{\nabla^{2}\psi(x)}^{2}+|\frac{\mathrm{D}^{4}f(x)[h^{\otimes4}]}{f(x)}|\,.$ Fix $h\in\mathbb{R}^{d}$ and $x\in\textup{int}(K)$, define $\phi(t):=\psi(x+th)$. Then, \phi'=-\frac{f'}{f}\,,\phi"=\left(\frac{f'}{f}\right)^{2}-\frac{f"}{f}=(\phi')^{2}-\frac{f"}{f}\,,\phi"'=2\phi'\phi"-\frac{f"'f-f"f'}{f^{2}}=2\phi'\phi"-\frac{f"'}{f}+\frac{f"f'}{f^{2}}=2\phi'\phi"+\phi'(\phi"-(\phi')^{2})-\frac{f"'}{f}=3\phi'\phi"-(\phi')^{3}-\frac{f"'}{f}\,,\phi^{(4)}=3(\phi")^{2}+3\phi'\phi"'-3(\phi')^{2}\phi"-\frac{f^{(4)}f-f"'f'}{f^{2}}=3(\phi")^{2}+3\phi'\phi"'-3(\phi')^{2}\phi"+\phi'\left(\phi"'-3\phi'\phi"+(\phi')^{3}\right)-\frac{f^{(4)}}{f}=3(\phi")^{2}+4\phi'\phi"'-6(\phi')^{2}\phi"+(\phi')^{4}-\frac{f^{(4)}}{f}\,. Since $|\phi"'|\leq2(\phi")^{3/2}$ (SC of $\phi$) and $\phi"\geq\frac{1}{\nu}(\phi')^{2}$ (the definition of the barrier parameter), which is equivalent to $|\phi'|\leq\sqrt{\nu}(\phi")^{1/2}$, we can directly compute as follows: |\phi^{(4)}|\leq4\,|\phi'\phi"'|+3\,|(\phi")^{2}|+6|\,(\phi')^{2}\phi"|+|(\phi')^{4}|+|\frac{f^{(4)}}{f}|\leq8\sqrt{\nu}\,|\phi"|^{2}+3\,|\phi"|^{2}+6\nu\,|\phi"|^{2}+\nu^{2}\,|\phi"|^{2}+|\frac{f^{(4)}}{f}|\lesssim\nu^{2}|\phi"|^{2}+|\frac{f^{(4)}}{f}|\,.\qedhere Using this tool, we study Dikin-amenability of barriers for quadratic constraints. Let us check the last claim first. By Lemma \ref{['lem:linear-trans']}, we may assume that $\phi(x,y)=-\log(l+q^{\mathsf{T}}y-\frac{1}{2}{\|x\|}^{2})\,,$ and let $f(x,y)=l+q^{\mathsf{T}}y-\frac{1}{2}\,{\|x\|}^{2}$. For $z=(x,y)\in\textup{int}(K)$ and $u=(u_{x},u_{y})\in\mathbb{R}^{d}$, we have \mathrm{D}\phi(z)[u]=-\frac{1}{f}\,(q\cdot u_{y}-x\cdot u_{x})=\frac{x\cdot u_{x}-q\cdot u_{y}}{f}\,,\mathrm{D}^{2}\phi(z)[u,u]=\frac{1}{f^{2}}\,(x\cdot u_{x}-q\cdot u_{y})^{2}+\frac{1}{f}\,{\|u_{x}\|}^{2}\,. As for the first term in the RHS of \ref{['eq:hessian-quadratic']}, it holds that for $v=(v_{x},v_{y})\in\mathbb{R}^{d}$ \mathrm{D}{\Bigl(\frac{(x\cdot u_{x}-q\cdot u_{y})^{2}}{f^{2}}\Bigr)}[v]=\frac{2\,(x\cdot u_{x}-q\cdot u_{y})(v_{x}\cdot u_{x})}{f^{2}}+2\,(x\cdot u_{x}-q\cdot u_{y})^{2}\cdot\frac{x\cdot v_{x}-q\cdot v_{y}}{f^{3}}\,,\mathrm{D}^{2}{\Bigl(\frac{(x\cdot u_{x}-q\cdot u_{y})^{2}}{f^{2}}\Bigr)}[v,v]=\frac{2\,(v_{x}\cdot u_{x})^{2}}{f^{2}}+4\frac{(x\cdot u_{x}-q\cdot u_{y})(v_{x}\cdot u_{x})(x\cdot v_{x}-q\cdot v_{y})}{f^{3}}\quad+\frac{4\,(x\cdot u_{x}-q\cdot u_{y})(v_{x}\cdot u_{x})(x\cdot v_{x}-q\cdot v_{y})+2\,(x\cdot u_{x}-q\cdot u_{y})^{2}{\|v_{x}\|}^{2}}{f^{3}}\quad+\frac{6\,(x\cdot u_{x}-q\cdot u_{y})^{2}(x\cdot v_{x}-q\cdot v_{y})^{2}}{f^{4}}=\frac{2\,(v_{x}\cdot u_{x})^{2}}{f^{2}}+\frac{4\,(x_{q}\cdot u)(v_{x}\cdot u_{x})(x_{q}\cdot v)}{f^{3}}\quad+\frac{4\,(x_{q}\cdot u)(v_{x}\cdot u_{x})(x_{q}\cdot v)+2(x_{q}\cdot u)^{2}{\|v_{x}\|}^{2}}{f^{3}}+\frac{6\,(x_{q}\cdot u)^{2}(x_{q}\cdot v)^{2}}{f^{4}}\,, where $x_{q}:=(x,-q)\in\mathbb{R}^{d}$. As for the second term, direct computations lead to \mathrm{D}{\Bigl(\frac{{\|u_{x}\|}^{2}}{f}\Bigr)}[v]=\frac{1}{f^{2}}\,{\|u_{x}\|}^{2}(x\cdot v_{x}-q\cdot v_{y})\,,\mathrm{D}^{2}{\Bigl(\frac{{\|u_{x}\|}^{2}}{f}\Bigr)}[v,v]=\frac{2}{f^{3}}\,{\|u_{x}\|}^{2}(x\cdot v_{x}-q\cdot v_{y})^{2}+\frac{1}{f^{2}}\,{\|u_{x}\|}^{2}{\|v_{x}\|}^{2}=\frac{2}{f^{3}}\,{\|u_{x}\|}^{2}(x_{q}\cdot v)^{2}+\frac{1}{f^{2}}\,{\|u_{x}\|}^{2}{\|v_{x}\|}^{2}\,. Putting these together, for $u,v\in\mathbb{R}^{d}$ \mathrm{D}^{4}\phi[u,u,v,v]=\frac{1}{f^{2}}\,{\|u_{x}\|}^{2}{\|v_{x}\|}^{2}+\underbrace{\frac{2}{f^{2}}\,(v_{x}\cdot u_{x})^{2}}_{\geq0}+\frac{4}{f^{3}}\,{\Bigl(\frac{1}{2}\,{\|u_{x}\|}^{2}(x_{q}\cdot v)^{2}+2\,(x_{q}\cdot u)(v_{x}\cdot u_{x})(x_{q}\cdot v)+\frac{(x_{q}\cdot u)^{2}}{2}\,{\|v_{x}\|}^{2}\Bigr)}\qquad+\frac{6}{f^{4}}\,(x_{q}\cdot u)^{2}(x_{q}\cdot v)^{2}\geq\frac{4}{f^{3}}\,(\underbrace{\frac{1}{2}{\|u_{x}\|}^{2}(x_{q}\cdot v)^{2}+\frac{1}{2}{\|v_{x}\|}^{2}(x_{q}\cdot u)^{2}}_{\text{Use AM-GM}}+2(x_{q}\cdot u)(v_{x}\cdot u_{x})(x_{q}\cdot v))\qquad+\underbrace{\frac{1}{f^{2}}\,{\|u_{x}\|}^{2}{\|v_{x}\|}^{2}+\frac{6}{f^{4}}\,(x_{q}\cdot u)^{2}(x_{q}\cdot v)^{2}}_{\text{Use AM-GM}}\geq\frac{4}{f^{3}}\,{\bigl({\|u_{x}\|}\,{\|v_{x}\|}\,|x_{q}\cdot v|\,|x_{q}\cdot u|-2|x_{q}\cdot u|\,|x_{q}\cdot v|\,{\|u_{x}\|}\,{\|v_{x}\|}\bigr)}+\frac{2\sqrt{6}}{f^{3}}\,|x_{q}\cdot u|\,|x_{q}\cdot v|\,{\|u_{x}\|}\,{\|v_{x}\|}=\frac{4}{f^{3}}\,{\|u_{x}\|}\,{\|v_{x}\|}\,|x_{q}\cdot v|\,|x_{q}\cdot u|\,{\Bigl(\frac{\sqrt{6}}{2}-1\Bigr)}\geq0\,.\qedhere We start with convexity of $\log\det(\nabla^{2}\phi)$ for $\phi(X)=-\log\det X$. Using Lemma \ref{['prop:metricFormula']} and $\det{\bigl(M^{\mathsf{T}}(A\otimes A)M\bigr)}=2^{d(d-1)/2}\,(\det A)^{d+1}$ (Lemma \ref{['lem:Kronecker']}) in the first and second equality below, \log\det{\bigl(\nabla^{2}\phi(X)\bigr)}=\log\det{\bigl(M^{\mathsf{T}}(X^{-1}\otimes X^{-1})M\bigr)}=\frac{d(d-1)}{2}\,\log2-(d+1)\,\log\det X\,. Since $-\log\det X$ is convex in $X$ \ref{['eq:2ndDiffLogDet']}, the convexity of $\log\det{\bigl(\nabla^{2}\phi(X)\bigr)}$ also follows. Observe from the proof that $\log\det{\bigl(\nabla^{2}\phi(X)\bigr)}=\text{const.}+(d+1)\,\phi(X)$. Differentiating both sides in direction $H$, by \ref{['eq:gradLogDet']} $\textup{Tr}{\bigl([\nabla^{2}\phi(X)]^{-1}\mathrm{D}^{3}\phi(X)[H]\bigr)}=(d+1)\,\mathrm{D}\phi(X)[H]$. Hence, \textup{Tr}{\bigl([\nabla^{2}\phi(X)]^{-\frac{1}{2}}\mathrm{D}^{3}\phi(X)[H]\,[\nabla^{2}\phi(X)]^{-\frac{1}{2}}\bigr)}=-(d+1)\,\textup{Tr}(X^{-1}H)\,. We are ready to show SSC of $\phi$. For $H\in\mathbb{S}^{d}$ and $t\in\mathbb{R}$, denote $X_{t}:=X+tH$ and $g_{t}:=M^{\mathsf{T}}(X_{t}\otimes X_{t})^{-1}M$. Note that ${\bigl\Vert[\nabla^{2}\phi(X)]^{-\frac{1}{2}}\mathrm{D}^{3}\phi(X)[H]\,[\nabla^{2}\phi(X)]^{-\frac{1}{2}}\bigr\Vert}_{F}^{2}=\textup{Tr}(g^{-1}\partial_{t}g_{t}\vert_{t=0}\,g^{-1}\partial_{t}g_{t}\vert_{t=0})\,,$ and \partial_{t}g_{t}\vert_{t=0}\underset{\text{(i)}}{=}\partial_{t}{\bigl(M^{\mathsf{T}}(X_{t}\otimes X_{t})^{-1}M\bigr)}|_{t=0}\underset{\text{(ii)}}{=}-M^{\mathsf{T}}(X\otimes X)^{-1}\,\partial_{t}(X_{t}\otimes X_{t})\vert_{t=0}\,(X\otimes X)^{-1}M=-M^{\mathsf{T}}(X^{-1}\otimes X^{-1})(H\otimes X+X\otimes H)(X^{-1}\otimes X^{-1})M\underset{\text{(iii)}}{=}-M^{\mathsf{T}}(X^{-1}HX^{-1}\otimes X^{-1}+X^{-1}\otimes X^{-1}HX^{-1})M\,, where (i) follows from Lemma \ref{['prop:metricFormula']}, (ii) is due to \ref{['eq:diffInverse']}, and (iii) follows from $(A\otimes B)(C\otimes D)=(AC)\otimes(BD)$ (Lemma \ref{['lem:Kronecker']}-3). Recall that positive semidefinite matrices have unique positive semidefinite square roots, so $(X\otimes X)^{\frac{1}{2}}=X^{\frac{1}{2}}\otimes X^{\frac{1}{2}}$ (due to $(X^{1/2}\otimes X^{1/2})\cdot(X^{1/2}\otimes X^{1/2})=X\otimes X$). Since $g_{t}=M^{\mathsf{T}}(X_{t}\otimes X_{t})^{-1/2}(X_{t}\otimes X_{t})^{-1/2}M$, the corresponding orthogonal projection is $P_{t}:=P{\bigl((X_{t}\otimes X_{t})^{-\frac{1}{2}}M\bigr)}=(X_{t}\otimes X_{t})^{-\frac{1}{2}}Mg_{t}^{-1}M^{\mathsf{T}}(X_{t}\otimes X_{t})^{-\frac{1}{2}}\,.$ By substituting $\partial_{t}g_{t}|_{t=0}$ with \ref{['eq:18-1']}, \textup{Tr}(g^{-1}\partial_{t}g_{t}\vert_{t=0}\,g^{-1}\partial_{t}g_{t}\vert_{t=0})=\textup{Tr}\bigl(g^{-1}M^{\mathsf{T}}(X^{-1}HX^{-1}\otimes X^{-1}+X^{-1}\otimes X^{-1}HX^{-1})M\qquad\qquad\cdot g^{-1}M^{\mathsf{T}}(X^{-1}HX^{-1}\otimes X^{-1}+X^{-1}\otimes X^{-1}HX^{-1})\textcolor{blue}{M}\bigr)=\textup{Tr}\bigl(\textcolor{blue}{M}g^{-1}M^{\mathsf{T}}(X^{-1}HX^{-1}\otimes X^{-1}+X^{-1}\otimes X^{-1}HX^{-1})M\qquad\qquad\cdot g^{-1}M^{\mathsf{T}}(X^{-1}HX^{-1}\otimes X^{-1}+X^{-1}\otimes X^{-1}HX^{-1})\bigr)=\textup{Tr}{\Bigl({\bigl[\textcolor{red}{Mg^{-1}M^{\mathsf{T}}}(X^{-1}HX^{-1}\otimes X^{-1}+X^{-1}\otimes X^{-1}HX^{-1})\bigr]}^{2}\Bigr)}=\textup{Tr}{\Bigl({\bigl[\textcolor{red}{(X\otimes X)^{\frac{1}{2}}P(X\otimes X)^{\frac{1}{2}}}(X^{-1}HX^{-1}\otimes X^{-1}+X^{-1}\otimes X^{-1}HX^{-1})\bigr]}^{2}\Bigr)}=\textup{Tr}{\Bigl({\bigl[P\underbrace{(X\otimes X)^{\frac{1}{2}}(X^{-1}HX^{-1}\otimes X^{-1}+X^{-1}\otimes X^{-1}HX^{-1})(X\otimes X)^{\frac{1}{2}}}_{\eqqcolon S}\bigr]}^{2}\Bigr)}=\textup{Tr}(PSPS)\,. Using Lemma \ref{['lem:Kronecker']}-3, S=\underbrace{X^{-\frac{1}{2}}HX^{-\frac{1}{2}}\otimes I_{d}}_{\eqqcolon A}+\underbrace{I_{d}\otimes X^{-\frac{1}{2}}HX^{-\frac{1}{2}}}_{\eqqcolon B}\,. By the Cauchy-Schwarz inequality along with $P^{\mathsf{T}}P=P^{2}=P$ and $P\preceq I_{d}$, \textup{Tr}(PSPS)\leq\textup{Tr}((PS)^{\mathsf{T}}PS)\leq\textup{Tr}(S^{\mathsf{T}}S)={\|S\|}_{F}^{2}\leq({\|A\|}_{F}+{\|B\|}_{F})^{2}\,. Using Lemma \ref{['lem:Kronecker']}-3, {\|A\|}_{F}^{2}=\textup{Tr}{\bigl((X^{-\frac{1}{2}}HX^{-\frac{1}{2}}\otimes I_{d})\cdot(X^{-\frac{1}{2}}HX^{-\frac{1}{2}}\otimes I_{d})\bigr)}=\textup{Tr}(X^{-\frac{1}{2}}HX^{-1}HX^{-\frac{1}{2}}\otimes I_{d})=\textup{Tr}(X^{-\frac{1}{2}}HX^{-1}HX^{-\frac{1}{2}})\,\textup{Tr}(I_{d})=d\,{\|H\|}_{X}^{2}\,, and similarly ${\|B\|}_{F}^{2}=d\,{\|H\|}_{X}^{2}$. Therefore, $\psi_{X}\leq2\sqrt{d}$ follows from ${\bigl\Vert[\nabla^{2}\phi(X)]^{-\frac{1}{2}}\mathrm{D}^{3}\phi(X)[H]\,[\nabla^{2}\phi(X)]^{-\frac{1}{2}}\bigr\Vert}_{F}\leq\sqrt{\textup{Tr}(PSPS)}\leq2\sqrt{d}\,{\|H\|}_{X}\,.$ To see the optimality of $\mathcal{O}(d^{1/2})$, we recall \ref{['eq:difflogdet']}: $\textup{Tr}{\bigl([\nabla^{2}\phi(X)]^{-\frac{1}{2}}\mathrm{D}^{3}\phi(X)[H]\,[\nabla^{2}\phi(X)]^{-\frac{1}{2}}\bigr)}=-(d+1)\,\textup{Tr}(X^{-1}H)\,.$ Taking supremum on both sides, \sup_{H:{\|H\|}_{X}=1}\textup{Tr}{\bigl([\nabla^{2}\phi(X)]^{-\frac{1}{2}}\mathrm{D}^{3}\phi(X)[H]\,[\nabla^{2}\phi(X)]^{-\frac{1}{2}}\bigr)}=\sup_{\substack{H\in\mathbb{S}^{d}:\\ {\|X^{-1/2}HX^{-1/2}\|}_{F}=1 } }-(d+1)\,\textup{Tr}(X^{-\frac{1}{2}}HX^{-\frac{1}{2}})=\sup_{S\in\mathbb{S}^{d}:{\|S\|}_{F}=1}(d+1)\,\textup{Tr}(S)\,, and this objective achieves the maximum at $H=-d^{-1/2}X$, with the supremum being $(d+1)\sqrt{d}$. On the other hand, due to $\textup{Tr}(A)\leq d^{1/2}\,{\|A\|}_{F}$ for $A\in\mathbb{R}^{d\times d}$, \textup{Tr}{\bigl([\nabla^{2}\phi(X)]^{-\frac{1}{2}}\mathrm{D}^{3}\phi(X)[H]\,[\nabla^{2}\phi(X)]^{-\frac{1}{2}}\bigr)}\leq\sqrt{\frac{d(d+1)}{2}}\cdot{\bigl\Vert[\nabla^{2}\phi(X)]^{-\frac{1}{2}}\mathrm{D}^{3}\phi(X)[H]\,[\nabla^{2}\phi(X)]^{-\frac{1}{2}}\bigr\Vert}_{F}\leq\sqrt{\frac{d(d+1)}{2}}\cdot\psi_{X}{\|H\|}_{X}\,, and thus by taking supremum on both sides over a symmetric matrix $H$ with ${\|H\|}_{X}=1$, it follows that $(d+1)\sqrt{d}\leq\sqrt{\frac{d(d+1)}{2}}\,\psi_{X}$ and $\sqrt{2(d+1)}\leq\psi_{X}\,.\qedhere$ Direct computation leads to $\mathrm{D}^{2}g(X)[H,H]\succeq0$ (so SLTSC). For $g(X)=-\nabla^{2}\log\det X$, recall that $g(X)[H,H]=\textup{Tr}(X^{-1}HX^{-1}H)$. Thus for any $V\in\mathbb{S}^{d}$, \mathrm{D} g(X)[H,H,V]=-\textup{Tr}(X^{-1}VX^{-1}\cdot HX^{-1}H)-\textup{Tr}(X^{-1}H\cdot X^{-1}VX^{-1}\cdot H)=-2\,\textup{Tr}(X^{-1}VX^{-1}HX^{-1}H)\,, and differentiating again, \mathrm{D}^{2}g(X)[H,H,V,V]=4\,\textup{Tr}(X^{-1}VX^{-1}VX^{-1}HX^{-1}H)+2\,\textup{Tr}(X^{-1}VX^{-1}HX^{-1}VX^{-1}H)=4\,\textup{Tr}(X^{-\frac{1}{2}}HX^{-1}VX^{-1}VX^{-1}HX^{-\frac{1}{2}})+2\,\textup{Tr}(X^{-\frac{1}{2}}VX^{-1}HX^{-\frac{1}{2}}\cdot X^{-\frac{1}{2}}VX^{-1}HX^{-\frac{1}{2}})\underset{\text{(i)}}{\geq}4\,\textup{Tr}(X^{-\frac{1}{2}}HX^{-1}VX^{-1}VX^{-1}HX^{-\frac{1}{2}})-2\,\textup{Tr}(X^{-\frac{1}{2}}HX^{-1}VX^{-\frac{1}{2}}\cdot X^{-\frac{1}{2}}VX^{-1}HX^{-\frac{1}{2}})=2\,\textup{Tr}(X^{-\frac{1}{2}}HX^{-1}VX^{-1}VX^{-1}HX^{-\frac{1}{2}})\geq0\,, where in (i) we used the Cauchy-Schwarz inequality. Therefore, $\mathrm{D}^{2}g(X)[H,H]\succeq0$. We establish a connection to the Gaussian orthogonal ensemble (GOE): for $d_{s}=d(d+1)/2$ and $\textup{svec}(H)\sim\mathcal{N}{\bigl(0,\frac{r^{2}}{d_{s}}\,g(X)^{-1}\bigr)}$, we have $\frac{\sqrt{d_{s}d}}{r}X^{-\frac{1}{2}}HX^{-\frac{1}{2}}$ is the GOE. Let $h_{X}:=\textup{svec}(X^{-1/2}HX^{-1/2})$ and $h:=\textup{svec}(H)$. It holds that $h_{X}=L(X\otimes X)^{-\frac{1}{2}}Mh$ due to $h_{X}=\textup{svec}(X^{-\frac{1}{2}}HX^{-\frac{1}{2}})=L\,\textup{vec}(X^{-\frac{1}{2}}HX^{-\frac{1}{2}})=L(X\otimes X)^{-\frac{1}{2}}\textup{vec}(H)=L(X\otimes X)^{-\frac{1}{2}}Mh$. As $h\sim\mathcal{N}{\bigl(0,\frac{r^{2}}{d_{s}}\,g(X)^{-1}\bigr)}$, $h_{X}$ is a Gaussian with zero mean and covariance \frac{r^{2}}{d_{s}}L(X\otimes X)^{-\frac{1}{2}}Mg(X)^{-1}M^{\mathsf{T}}(X\otimes X)^{-\frac{1}{2}}L^{\mathsf{T}}\underset{\text{(i)}}{=}\frac{r^{2}}{d_{s}d}L(X\otimes X)^{-\frac{1}{2}}MLN(X\otimes X)N^{\mathsf{T}}L^{\mathsf{T}}M^{\mathsf{T}}(X\otimes X)^{-\frac{1}{2}}L^{\mathsf{T}}\underset{(*)}{=}\frac{r^{2}}{d_{s}d}L(X\otimes X)^{-\frac{1}{2}}N(X\otimes X)N^{\mathsf{T}}(X\otimes X)^{-\frac{1}{2}}L^{\mathsf{T}}\underset{(*)}{=}\frac{r^{2}}{d_{s}d}L(X\otimes X)^{-\frac{1}{2}}(X\otimes X)N(X\otimes X)^{-\frac{1}{2}}L^{\mathsf{T}}\underset{\text{(*)}}{=}\frac{r^{2}}{d_{s}d}LNL^{\mathsf{T}}\underset{\text{(ii)}}{=}\frac{r^{2}}{d_{s}d}\,\left[I_{d}\frac{1}{2} I_{d(d-1)/2}\right]\,, where (i) follows from Proposition \ref{['prop:metricFormula']}, $(*)$ follows from Lemma \ref{['lem:MNL-properties']}, and (ii) follows from magnus1980elimination that $LNL^{\mathsf{T}}$ is a $d_{s}\times d_{s}$ diagonal matrix with $d$ times $1$ and $\frac{1}{2} d(d-1)$ times $1/2$. Precisely, the entries of $h_{X}\in\mathbb{R}^{d_{s}}$ corresponding to the diagonals of $X^{-1/2}HX^{-1/2}$ are $1$, and its entries corresponding to off-diagonals is $1/2$. This is exactly the covariance matrix of a $d_{s}$-dimensional GOE, so $X^{-\frac{1}{2}}HX^{-\frac{1}{2}}\sim\frac{r}{\sqrt{d_{s}d}}G$ for the GOE $G$. Now we show ASC of $d\phi$. Expand ${\|Z-X\|}_{Z}^{2}:={\|Z-X\|}_{g(Z)}^{2}$ at $X$ for $Z=X+H$: ${\|Z-X\|}_{Z}^{2}-{\|Z-X\|}_{X}^{2}=\sum_{k=1}^{\infty}\frac{1}{k!}\,\mathrm{D}^{k}g(X)[H^{\otimes k+2}]\,.$ It follows from induction that for $H_{X}:=X^{-\frac{1}{2}}HX^{-\frac{1}{2}}$ \mathrm{D} g(X)[H^{\otimes3}]=-2d\,\textup{Tr}(X^{-1}HX^{-1}HX^{-1}H)=-2\textup{Tr}(H_{X}^{3})\,,\mathrm{D}^{2}g(X)[H^{\otimes4}]=3!\,d\,\textup{Tr}(H_{X}^{4})\,,\mathrm{D}^{k}g(X)[H^{\otimes(k+2)}]=(-1)^{k}(k+1)!\,d\,\textup{Tr}(H_{X}^{k+2})\,. Putting these back into the series expansion, for $H$ the GOE (see Lemma \ref{['lem:conn-to-goe']}) {\|Z-X\|}_{Z}^{2}-{\|Z-X\|}_{X}^{2}=\sum_{k=1}^{\infty}(-1)^{k}(k+1)d\,\textup{Tr}(H_{X}^{k+2})=\sum_{k=1}^{\infty}(-1)^{k}(k+1)d\cdot{\Bigl(\frac{r}{\sqrt{d_{s}d}}\Bigr)}^{k+2}\textup{Tr}(H^{k+2})=\frac{r^{2}}{d_{s}}\sum_{k=1}^{\infty}(-1)^{k}(k+1)\,{\Bigl(\frac{r}{\sqrt{d_{s}d}}\Bigr)}^{k}\textup{Tr}(H^{k+2})\,. As for ASC, it suffices to show that $\sum_{k=1}^{\infty}(-1)^{k}(k+1){\bigl(\frac{r}{\sqrt{d_{s}d}}\bigr)}^{k}\,\textup{Tr}(H^{k+2})$ can be made arbitrarily small. We first control $\sum_{k\geq2}$: $|\sum_{k\geq2}(-1)^{k}(k+1){\Bigl(\frac{r}{\sqrt{d_{s}d}}\Bigr)}^{k}\textup{Tr}(H^{k+2})|\leq\sum_{k\geq2}(k+1){\Bigl(\frac{r}{\sqrt{d_{s}d}}\Bigr)}^{k}d\cdot{\|H\|}_{\text{op}}^{k+2}\,.$ By vershynin2018high, ${\|H\|}_{\text{op}}\lesssim\sqrt{d}$ holds with high probability, and thus \sum_{k\geq2}(k+1){\Bigl(\frac{r}{\sqrt{d_{s}d}}\Bigr)}^{k}d\cdot{\|H\|}_{\text{op}}^{k+2}\leq\sum_{k\geq2}(k+1)r^{k}\frac{1}{d^{3k/2}}d\cdot d^{\frac{k+2}{2}}\leq\sum_{k\geq2}(k+1)r^{k}d^{2-k}\,. By taking $r=\Omega(1)$ small enough, we can make this series arbitrarily small. Now we bound $\frac{r}{d^{3/2}}\textup{Tr}(H^{3})$ ($k=1$ case). This is a Gaussian polynomial in $\textup{svec}(H)$, so it suffices to show $\mathbb{E}[(\textup{Tr}(H^{3}))^{2}]=\mathcal{O}(d^{3})$; we then use Lemma \ref{['lem:conc-gaussian-poly']} to obtain a high-probability bound on the Gaussian polynomial $\frac{r}{d^{3/2}}\textup{Tr}(H^{3})$. For $H=(H_{ab})\in\mathbb{S}^{d}$, ${\bigl(\textup{Tr}(H^{3})\bigr)}^{2}=\sum_{ipq}H_{ip}H_{pq}H_{qi}\cdot\sum_{jrs}H_{jr}H_{rs}H_{sj}=\sum_{ipqjrs}H_{ip}H_{pq}H_{qi}H_{jr}H_{rs}H_{sj}\,,$ where each $H_{**}$ in the summand is an independent Gaussian with zero mean and variance $1$ or $1/2$ (as $H$ is the GOE). We can classify the indices $\{i,p,q,j,r,s\}$ into the following types: 6\text{ distinct indices }\{a,b,c,d,e,f\}\,,5\text{ distinct indices }\{a,b,c,d,(e,e)\}\,,4\text{ distinct indices }\{a,b,c,(d,d,d)\},\{a,b,(c,c),(d,d)\}\,,\text{Others }\dots\,, where for example $\{a,b,c,d,e,f\}$ means all indices are different, and $\{a,b,c,d,(e,e)\}$ means that there appear 5 different indices $\{a,b,c,d,e\}$ but exists one pair $(e,e)$ of the same index. Note that $\mathbb{E} H_{ip}H_{pq}H_{qi}H_{jr}H_{rs}H_{sj}=\mathcal{O}(1)$ is at most the sixth moment of a standard Gaussian. It implies that toward our goal of showing $\mathcal{O}(d^{3})$-bound on ${\bigl(\textup{Tr}(H^{3})\bigr)}^{2}$, it suffices to look into only three types of indices above. This is because the terms from other types contribute at most $\mathcal{O}(d^{3})$ to ${\bigl(\textup{Tr}(H^{3})\bigr)}^{2}$. A structure of indices of $H_{ip}H_{pq}H_{qi}\cdot H_{jr}H_{rs}H_{sj}$ For any term with 6 distinct indices, we can always find an 'uncoupled' $H_{**}$ (for example $H_{ab}$) in the summand that is independent of all the others, so its expectation of the summand is $0$. For the terms with $5$-distinct indices $\{a,b,c,d,(e,e)\}$, due to symmetry (see Figure \ref{['fig:ipq-jrs']}) we can further classify the index $(i,p,q,j,r,s)$ into either $(a,b,c,d,e,e)$ or $(a,b,e,c,d,e)$. In both cases , $H_{ab}$ has no coupled Gaussian, so the expectations of the summand are also $0$. For $4$-distinct indices, let us first consider $\{a,b,c,(d,d,d)\}$-type indices. In this case $(i,p,q,j,r,s)$ is of the form either $(a,a,a,b,c,d)$ or $(a,a,b,a,c,d)$ due to symmetry. In both cases, $H_{cd}$ has no coupled Gaussian. Now consider $\{a,b,(c,c),(d,d)\}$-type indices. Then $(i,p,q,j,r,s)$ is of the form either $(a,b,c,c,d,d)$ or $(a,c,c,b,d,d)$ or $(a,c,d,b,c,d)$. For each case, $H_{ab},H_{cc},H_{ac}$ are uncoupled ones. Therefore, $\mathbb{E}[H_{ip}H_{pq}H_{qi}H_{jr}H_{rs}H_{sj}]=0$ whenever there are at least $4$ distinct indices. It seems challenging to show that $\phi$ is SASC using the same technique. When $g$ is $g=d\,\nabla^{2}(-\log\det X)+g'$ for other PSD matrix function $g'$, we know that $\textup{svec}(H_{X})=\textup{svec}(X^{-\frac{1}{2}}HX^{-\frac{1}{2}})$ follows a Gaussian distribution with zero mean and covariance matrix $M$ satisfying $M\preceq\left[I_{d}\frac{1}{2} I_{d(d-1)/2}\right]\,.$ A main difference in the SASC setting is that the entries of $h=\textup{svec}(H_{X})$ might exhibit dependencies, making the previous approach infeasible. This arises because many fundamental results in the random matrix theory often presume independence of the entries of a random matrix. Moreover, our combinatorial argument for the $k=1$ case is not feasible in the presence of such dependencies. We define $g_{X}=g=2(d^{2}g_{1}+g_{2})$, where $g_{1}(X)=M^{\mathsf{T}}(X\otimes X)^{-1}M\qquad\text{and}\qquad g_{2}(X)=22\sqrt{\frac{m}{d}}\,M^{\mathsf{T}}A_{X}^{\mathsf{T}}{\bigl(\Sigma_{X}+\frac{d}{m}I_{m}\bigr)}A_{X}M\,.$ Since $d^{2}g_{1}$ and $g_{2}$ are SSC, $g$ is also SSC due to Lemma \ref{['lem:ssc-sum']} and $\mathcal{O}(d^{3}+\sqrt{md^{2}})$-symmetric due to Lemma \ref{['lem:symmetry-addition']}. As $d^{2}g_{1}$ and $g_{2}$ is SLTSC and SASC, $g$ is LTSC and ASC. Putting these together, it follows that $g$ is ${\bigl(\mathcal{O}(d^{3}+\sqrt{md^{2}}),\mathcal{O}(d^{3}+\sqrt{md^{2}})\bigr)}$-Dikin-amenable. Therefore, Theorem \ref{['thm:Dikin-annealing']} implies that $\mathsf{GCDW}$ incurs $\widetilde{\mathcal{O}}(d^{2}(d^{3}+\sqrt{md^{2}}))=\widetilde{\mathcal{O}}(d^{3}(d^{2}+\sqrt{m}))$ total iterations of the $\mathsf{Dikin\ walk}$ with $g$. Now we bound the per-step complexity of the $\mathsf{Dikin\ walk}$ (Algorithm \ref{['alg:DikinWalk']}). Recall that it requires (1) the update of the leverage scores, (2) computation of the matrix function induced by the local metric $g$, (3) the inverse of the matrix function and (4) its determinant. By lee2019solving (with $p=2$ and $d\gets d_{s}$ therein), the initialization of the leverage scores at the beginning takes $\widetilde{\mathcal{O}}(md^{2\omega})$ and their updates takes $\widetilde{\mathcal{O}}(md^{2(\omega-1)})$ time. Since (1) takes $\widetilde{\mathcal{O}}(md^{2(\omega-1)})$, (2) takes $\widetilde{\mathcal{O}}(d^{4}+md^{2(\omega-1)})$, and (3) and (4) take $\mathcal{O}\left(d^{2\omega}\right)$, each iteration runs in $\widetilde{\mathcal{O}}(d^{2\omega}+md^{2(\omega-1)})$ time. Even though the initialization of leverage scores takes $\widetilde{\mathcal{O}}(md^{2\omega})$ time, the amortized per-step time complexity becomes $\widetilde{\mathcal{O}}(d^{2\omega}+md^{2(\omega-1)})=\widetilde{\mathcal{O}}(md^{2(\omega-1)})$ time, as the mixing rate is $\widetilde{\mathcal{O}}(d^{3}(d^{2}+\sqrt{m}))$. We define $g_{X}=g=2(d^{2}g_{1}+g_{2})$, where for some constants $c_{1},c_{2}>0$, $g_{1}(X)=M^{\mathsf{T}}(X\otimes X)^{-1}M\qquad\text{and}\qquad g_{2}(X)=dc_{1}(\log m)^{c_{2}}M^{\mathsf{T}}A_{X}^{\mathsf{T}}W_{X}A_{X}M\,.$ Since $d^{2}g_{1}$ and $g_{2}$ are SSC, $g$ is also SSC due to Lemma \ref{['lem:ssc-sum']} and $\mathcal{O}^{*}(d^{3})$-symmetric due to Lemma \ref{['lem:symmetry-addition']}. As $d^{2}g_{1}$ and $g_{2}$ is SLTSC and SASC, $g$ is LTSC and ASC. Putting these together, it follows that $g$ is ${\bigl(\mathcal{O}^{*}(d^{3}),\mathcal{O}^{*}(d^{3})\bigr)}$-Dikin-amenable. Therefore, Theorem \ref{['thm:Dikin-annealing']} implies that $\mathsf{GCDW}$ requires $\widetilde{\mathcal{O}}(d^{5})$ iterations of the $\mathsf{Dikin\ walk}$ with $g$. Since the initialization and update of the Lewis weight takes $\widetilde{\mathcal{O}}(md^{2\omega})$ and $\widetilde{\mathcal{O}}(md^{2(\omega-1)})$ time lee2019solving, the same implementation with Theorem \ref{['thm:hybridPSD']} also has the time complexity of $\widetilde{\mathcal{O}}(md^{2(\omega-1)})$. Let $v\in\mathbb{R}^{d_{s}}$ be a given vector, and denote $\bar{g}_{0}:=g_{1}$ and $\bar{g}_{i}:=\bar{g}_{i-1}+u_{i}u_{i}^{\mathsf{T}}$ for $i\in[m]$. We first prepare the column vectors $u_{i}$'s of $U=M^{\mathsf{T}}A^{\mathsf{T}}S_{X}^{-1}$ in $\mathcal{O}(md^{2})$ time and then initialize $\bar{g}_{0}^{-1}v$ and $\bar{g}_{0}^{-1}u_{i}$ for $i\in[m]$ in $\mathcal{O}(md^{\omega})$ time. For $u_{i}$'s, note that $S_{X}$ can be prepared in $\mathcal{O}(md^{2})$ time, and thus $A^{\mathsf{T}}S_{X}^{-1}$ takes $\mathcal{O}(md^{2})$ time due to $A\in\mathbb{R}^{d^{2}\times m}$. Since each row of $M^{\mathsf{T}}\in\mathbb{R}^{d_{s}\times d^{2}}$ has at most two non-zero entries, we can obtain $u_{i}$'s in $\mathcal{O}(md^{2})$ time. For $\bar{g}_{0}^{-1}v$ and $\bar{g}_{0}^{-1}u_{i}$, we recall from Lemma \ref{['prop:metricFormula']} that for a vector $z\in\mathbb{R}^{d_{s}}$ g_{1}^{-1}z=M^{\dagger}(X\otimes X)(M^{\dagger})^{\mathsf{T}}z=LN(X\otimes X)NL^{\mathsf{T}}z\,. Since each row of $L^{\mathsf{T}}\in\mathbb{R}^{d^{2}\times d_{s}}$ has at most two non-zero entries, $w:=L^{\mathsf{T}}z\in\mathbb{R}^{d^{2}}$ can be computed in $\mathcal{O}(d^{2})$ time. From the definition of $N$, it follows that $Nw=\textup{vec}{\bigl(\frac{1}{2}(W+W^{\mathsf{T}})\bigr)}$ for $W:=\textup{vec}^{-1}(w)\in\mathbb{R}^{d\times d}$, which also can be computed in $\mathcal{O}(d^{2})$ time. For $\overline{W}:=\frac{1}{2}(W+W^{\mathsf{T}})$, it follows that $(X\otimes X)Nw=(X\otimes X)\textup{vec}(\overline{W})\underset{\text{Lemma }\ref{['lem:Kronecker']}\text{-1}}{=}\textup{vec}(X\overline{W}X)\,,$ which can be computed in $\mathcal{O}(d^{\omega})$ time by the fast matrix multiplication, and in a similar way we can compute $LN\,\textup{vec}(X\overline{W}X)$ in $\mathcal{O}(d^{2})$ time. Putting all these together, $\bar{g}_{0}^{-1}v$ can be computed in $\mathcal{O}(d^{\omega})$ time, and repeating this for $u_{j}$'s yields $\{\bar{g}_{0}^{-1}v,\bar{g}_{0}^{-1}u_{1},\dots,\bar{g}_{0}^{-1}u_{m}\}$ in $\mathcal{O}(md^{\omega})$ time. Starting with these initializations, we recursively use the Sherman--Morrison formula: for $z\in\mathbb{R}^{d_{s}}$, $\bar{g}_{i}^{-1}z=\bar{g}_{i-1}^{-1}z-\frac{\bar{g}_{i-1}^{-1}u_{i}u_{i}^{\mathsf{T}}\bar{g}_{i-1}^{-1}z}{1+u_{i}^{\mathsf{T}}\bar{g}_{i-1}^{-1}u_{i}}\,.$ Using $\bar{g}_{i-1}^{-1}u_{j}$ and $\bar{g}_{i-1}^{-1}v$ from a previous iteration, we can compute each of $\bar{g}_{i}^{-1}u_{j}$ and $\bar{g}_{i}^{-1}v$ in the current iteration in $\mathcal{O}(d^{2})$ time, and thus each round for update takes $\mathcal{O}(md^{2})$ time in total. Since we iterate for $m$ rounds, Algorithm \ref{['alg:subroutine']} outputs $\bar{g}_{m}^{-1}v=g(X)^{-1}v$ in $\mathcal{O}(md^{\omega}+m^{2}d^{2})$ time. Here we provide details of Algorithm \ref{['alg:perStep-small-m']} in two stages -- (1) sampling from $\mathcal{N}{\bigl(0,\frac{r^{2}}{d}g(x)^{-1}\bigr)}$ and (2) computation of acceptance probability. For simplicity, we ignore $r^{2}/d$ and illustrate how to draw $v\sim\mathcal{N}(0,g(X)^{-1})$ without full computation of $g(X)^{-1}$ in $\mathcal{O}(md^{\omega}+m^{2}d^{2})$ time. Our approach is to compute $v:=g(X)^{-1}\left[BU\right]w$ for $w\sim\mathcal{N}(0,I_{d^{2}+m})$, which follows the Gaussian distribution with covariance g(X)^{-1}\left[BU\right]{\Bigl(g(X)^{-1}\left[BU\right]\Bigr)}^{\mathsf{T}}=g(X)^{-1}(BB^{\mathsf{T}}+CC^{\mathsf{T}})g(X)^{-1}g(X)^{-1}\,, since $v$ is a linear transformation of the Gaussian random variable $w$, and $BB^{\mathsf{T}}+CC^{\mathsf{T}}=g(X)$. Denoting $w=(w_{b},w_{u})$ for $w_{b}\sim\mathcal{N}(0,I_{d^{2}})$ and $w_{u}\sim\mathcal{N}(0,I_{m})$, we can show that $\left[BU\right]w$ can be computed in $\mathcal{O}(d^{\omega}+md^{2})$ time as follows: \left[BU\right]w=Bw_{b}+Uw_{c}=M^{\mathsf{T}}\underbrace{(X\otimes X)^{-1/2}w_{b}}_{\text{Use Lemma }\ref{['eq:sherman-morrison']}}+M^{\mathsf{T}}A^{\mathsf{T}}S_{X}^{-1}w_{c}=M^{\mathsf{T}}{\Bigl(\textup{vec}{\bigl(X^{-1/2}\textup{vec}^{-1}(w_{b})\,X^{-1/2}\bigr)}+A^{\mathsf{T}}S_{X}^{-1}w_{c}\Bigr)}\,, where $\textup{vec}{\bigl(X^{-1/2}\,\textup{vec}^{-1}(w_{b})\,X^{-1/2}\bigr)}$ and $A^{\mathsf{T}}S_{X}^{-1}w_{u}$ can be computed in $\mathcal{O}(d^{\omega})$ and $\mathcal{O}(md^{2})$ time, respectively. Since each row of $M^{\mathsf{T}}\in\mathbb{R}^{d_{s}\times d^{2}}$ has at most two non-zero entries, $\left[BU\right]w$ can be computed in $\mathcal{O}(d^{\omega}+md^{2})$ time. Using Algorithm \ref{['alg:subroutine']}, we obtain $v=g(X)^{-1}\left[BU\right]w$ in $\mathcal{O}(md^{\omega}+m^{2}d^{2})$ time. We show that this step also takes $\mathcal{O}(md^{\omega}+m^{2}d^{2})$ time. To compute $\det g(X)$, we use Algorithm \ref{['alg:subroutine']} to prepare $\{\bar{g}_{i}^{-1}u_{1},\dots,\bar{g}_{i}^{-1}u_{m}\}_{i=0}^{m}$ at $X$ and $Y=\textup{svec}^{-1}(y)$ in $\mathcal{O}(md^{\omega}+m^{2}d^{2})$ time. Recall the matrix determinant lemma: $\det(A+uu^{\mathsf{T}})=(1+u^{\mathsf{T}}A^{-1}u)\,\det A\,.$ Using the following recursive formula \det(\bar{g}_{i+1})=\det(\bar{g}_{i}+u_{i+1}u_{i+1}^{\mathsf{T}})=(1+u_{i+1}^{\mathsf{T}}\bar{g}_{i}^{-1}u_{i+1})\,\det\bar{g}_{i}\,, we start with $\det\bar{g}_{0}=\det g_{1}=2^{d(d-1)/2}(\det X)^{-(d+1)}$ (see Lemma \ref{['lem:Kronecker']}-7), which can be computed in $\mathcal{O}(d^{\omega})$ time, and compute $\det g(X)$ (and $\det g(Y)$ in the same way) in $\mathcal{O}(md^{\omega}+m^{2}d^{2})$ time. We just reproduce the proof of Lemma \ref{['lem:one-step']}. For $\pi\propto\exp(-f)\cdot\mathbf{1}_{K}$, we denote $p_{x}=\mathcal{N}{\Bigl(x,\frac{r^{2}}{d}g(x)^{-1}\Bigr)},\qquad R_{x}(z)=\frac{p_{z}(x)}{p_{x}(z)}\frac{\pi(z)}{\pi(x)},\qquad A_{x}(z)=\min{\bigl(1,R_{x}(z)\,\mathbf{1}_{K}(z)\bigr)}\,.$ Then the transition kernel of the $\mathsf{Dikin\ walk}$ started at $x$ can be written as \widetilde{P}(x,dz)=\underbrace{(1-\mathbb{E}_{p_{x}}[A_{x}(\cdot)])}_{=:r_{x}}\,\delta_{x}(\mathrm{d} z)+A_{x}(z)\,p_{x}(z)\,\mathrm{d} z\,. Thus, for $x,y\in\textup{int}(K)$ d_{\textrm{TV}}(P_{x},P_{y})=\underbrace{\frac{r_{x}+r_{y}}{2}}_{\textsf{I}}+\underbrace{\frac{1}{2}\int|A_{x}(z)\,p_{x}(z)-A_{y}(z)\,p_{y}(z)|\,\mathrm{d} z}_{\textsf{II}}\,. We note that $(1-\delta)\,\widetilde{g}_{2}\preceq g_{2}\preceq(1+\delta)\,\widetilde{g}_{2}$ and thus $(1-\delta)\,\widetilde{g}\preceq g\preceq(1+\delta)\,\widetilde{g}\,,$ and this implies $(1-\delta)\,I\preceq\widetilde{g}^{-1/2}g\widetilde{g}^{-1/2}\preceq(1+\delta)\,I$. Hence, $(1-\delta)^{d^{2}/2}\leq\sqrt{\frac{\det g}{\det\widetilde{g}}}\leq(1+\delta)^{d^{2}/2}$ and (1-\delta)^{d^{2}}\sqrt{\frac{\det\widetilde{g}(z)}{\det\widetilde{g}(x)}}\leq\sqrt{\frac{\det g(z)}{\det g(x)}}\leq(1+\delta)^{d^{2}}\sqrt{\frac{\det\widetilde{g}(z)}{\det\widetilde{g}(x)}}\,. With this in mind, recall that $r_{x}=1-\mathbb{E}_{p_{x}}[A_{x}(\cdot)]=1-\int\min{\Bigl(1,\,\underbrace{\mathbf{1}_{K}(z)\frac{\exp(-f(z))}{\exp(-f(x))}}_{\eqqcolon\textsf{A}}\underbrace{\frac{p_{z}(x)}{p_{x}(z)}}_{\eqqcolon\textsf{B}}\Bigr)}\,p_{x}(z)\,\mathrm{d} z.$ We can bound $\textsf{A}$ in a similar way by using (\ref{['eq:closeness-approx']}). As for $\textsf{B}$, $\log\text{B}=-\frac{d}{2r^{2}}({\|z-x\|}_{z}^{2}-{\|z-x\|}_{x}^{2})+\frac{1}{2}(\log\det\widetilde{g}(z)-\log\det\widetilde{g}(x))\,.$ As in Lemma \ref{['lem:one-step']}, the second term can be bounded lower by $\exp\left(-3\varepsilon\right)$ using (\ref{['eq:similar-ratio-approx']}). The first term can be lower-bounded by invoking ASC of $g$. To see this, ignoring the normalization constant of $g_{x}$ (*)=\int\mathbf{1}{\Bigl({\|z-x\|}_{\widetilde{g}(z)}^{2}-{\|z-x\|}_{\widetilde{g}(x)}^{2}\leq2\varepsilon\frac{r^{2}}{d}\Bigr)}\sqrt{\left\lvert \widetilde{g}(x)\right\rvert }\exp{\bigl(-\frac{1}{2}{\|z-x\|}_{\widetilde{g}(x)}^{2}\bigr)}\,\mathrm{d} z=\int\mathbf{1}{\Bigl({\|z-x\|}_{\widetilde{g}(z)}^{2}-{\|z-x\|}_{\widetilde{g}(x)}^{2}\leq2\varepsilon\frac{r^{2}}{d}\Bigr)}\sqrt{\left\lvert g(x)\right\rvert }\exp{\bigl(-\frac{1}{2}{\|z-x\|}_{g(x)}^{2}\bigr)}\qquad\cdot\sqrt{\left\lvert \frac{\widetilde{g}(x)}{g(x)}\right\rvert }\exp{\bigl(-\frac{1}{2}({\|z-x\|}_{\widetilde{g}(x)}^{2}-{\|z-x\|}_{g(x)}^{2})\bigr)}\,\mathrm{d} z\leq\int\mathbf{1}{\Bigl({\|z-x\|}_{\widetilde{g}(z)}^{2}-{\|z-x\|}_{\widetilde{g}(x)}^{2}\leq2\varepsilon\frac{r^{2}}{d}\Bigr)}\sqrt{\left\lvert g(x)\right\rvert }\exp{\bigl(-\frac{1}{2}{\|z-x\|}_{g(x)}^{2}\bigr)}\qquad\cdot(1+\delta)^{d^{2}/2}\exp{\bigl(\frac{\delta}{2}{\|z-x\|}_{g(x)}^{2}\bigr)}\,\mathrm{d} z\,. Due to ${\|z-x\|}_{g(x)}^{2}\lesssim r^{2}$ w.h.p., taking $\delta=\varepsilon/d^{10}$ leads to $(*)\leq2\int\mathbf{1}{\Bigl({\|z-x\|}_{\widetilde{g}(z)}^{2}-{\|z-x\|}_{\widetilde{g}(x)}^{2}\leq2\varepsilon\frac{r^{2}}{d}\Bigr)}\sqrt{\left\lvert g(x)\right\rvert }\exp{\bigl(-\frac{1}{2}{\|z-x\|}_{g(x)}^{2}\bigr)}\,\mathrm{d} z.$ Also, due to {\|z-x\|}_{\widetilde{g}(z)}^{2}-{\|z-x\|}_{\widetilde{g}(x)}^{2}\geq(1-\delta)\,{\|z-x\|}_{g(z)}^{2}-(1+\delta)\,{\|z-x\|}_{g(x)}^{2}=(1-\delta)\,({\|z-x\|}_{g(z)}^{2}-{\|z-x\|}_{g(x)}^{2})-2\delta\,{\|z-x\|}_{g(x)}^{2}\,, we have (*)\leq2\int\mathbf{1}{\Bigl({\|z-x\|}_{g(z)}^{2}-{\|z-x\|}_{g(x)}^{2}\leq(2\varepsilon(1-\delta)^{-1}+\varepsilon)\,\frac{r^{2}}{d}\Bigr)}\sqrt{\left\lvert g(x)\right\rvert }e^{-\frac{1}{2}{\|z-x\|}_{g(x)}^{2}}\,\mathrm{d} z\leq6\varepsilon by invoking ASC of $g$ in the last inequality. Putting these together, $\mathsf{I}\leq\frac{1}{2}+\mathcal{O}(\varepsilon)$. For $\mathsf{II}$, we can follow the proof of Lemma \ref{['lem:one-step']} to show $\mathsf{II}\leq\frac{1}{4}+\mathcal{O}(\varepsilon)$, and every technical issue can be resolved by repeating the same techniques above. This work was supported in part by NSF awards CCF-2007443 and CCF-2134105. Kwangjun Ahn and Sinho Chewi. Efficient constrained sampling via the mirror-Langevin algorithm. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, pages 28405--28418, 2021.Kurt M Anstreicher. Volumetric path following algorithms for linear programming. Mathematical Programming, 76: 245--263, 1997.Sébastien Bubeck and Ronen Eldan. The entropic barrier: a simple and optimal universal self-concordant barrier. In Conference on Learning Theory (COLT), volume 40 of Proceedings of Machine Learning Research, pages 279--279. PMLR, 2015.Yuansi Chen. An almost constant lower bound of the isoperimetric coefficient in the KLS conjecture. Geometric and Functional Analysis (GAFA), 31: 34--61, 2021.Yuansi Chen and Ronen Eldan. Hit-and-run mixing via localization schemes. arXiv preprint arXiv:2212.00297, 2022.Yuansi Chen, Raaz Dwivedi, Martin J Wainwright, and Bin Yu. Fast MCMC sampling algorithms on polytopes. The Journal of Machine Learning Research (JMLR), 19 (1): 2146--2231, 2018.Sinho Chewi. The entropic barrier is $n$-self-concordant, pages 209--222. Springer International Publishing, Cham, 2023a.Sinho Chewi. Log-concave sampling. Book draft available at https://chewisinho. github. io, 2023b.Ben Cousins and Santosh Vempala. Gaussian Cooling and $\mathcal{O}^{*}(n^{3})$ algorithms for volume and Gaussian volume. SIAM Journal on Computing (SICOMP), 47 (3): 1237--1273, 2018.Khashayar Gatmiry and Santosh S Vempala. Convergence of the Riemannian Langevin algorithm. arXiv preprint arXiv:2204.10818, 2022.Khashayar Gatmiry, Jonathan Kelner, and Santosh S Vempala. Sampling with barriers: Faster mixing via Lewis weights. arXiv preprint arXiv:2303.00480, 2023.Mark Girolami and Ben Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73 (2): 123--214, 2011.Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, and Kevin Tian. Algorithmic aspects of the log-Laplace transform and a non-Euclidean proximal sampler. In Conference on Learning Theory (COLT), volume 195 of Proceedings of Machine Learning Research, pages 2399--2439. PMLR, 2023.Osman Güler. Hyperbolic polynomials and interior point methods for convex programming. Mathematics of Operations Research, 22 (2): 350--377, 1997.He Jia, Aditi Laddha, Yin Tat Lee, and Santosh Vempala. Reducing isotropy and volume to KLS: an $\mathcal{O}^*(n^3 {\psi}^2)$ volume algorithm. In Symposium on Theory of Computing (STOC), pages 961--974, 2021.Adam Tauman Kalai and Santosh Vempala. Simulated annealing for convex optimization. Mathematics of Operations Research, 31 (2): 253--266, 2006.Ravi Kannan, László Lovász, and Miklós Simonovits. Random walks and an $\mathcal{O}^*(n^5)$ volume algorithm for convex bodies. Random Structures & Algorithms (RS&A), 11 (1): 1--50, 1997.Ravindran Kannan and Hariharan Narayanan. Random walks on polytopes and an affine interior point method for linear programming. Mathematics of Operations Research, 37 (1): 1--20, 2012.Boáz Klartag. Logarithmic bounds for isoperimetry and slices of convex sets. Ars Inveniendi Analytica, 2023. doi: 10.15781/jsjy-0b06.Yunbum Kook, Yin Tat Lee, Ruoqi Shen, and Santosh Vempala. Condition-number-independent convergence rate of Riemannian Hamiltonian Monte Carlo with numerical integrators. In Conference on Learning Theory (COLT), volume 195 of Proceedings of Machine Learning Research, pages 4504--4569. PMLR, 2023.Aditi Laddha, Yin Tat Lee, and Santosh Vempala. Strong self-concordance and sampling. In Symposium on Theory of Computing (STOC), pages 1212--1222, 2020.Robert Lang. A note on the measurability of convex sets. Archiv der Mathematik, 47: 90--92, 1986.François Le Gall. Powers of tensors and fast matrix multiplication. In International Symposium on Symbolic and Algebraic Computation (ISSAC), pages 296--303, 2014.Yin Tat Lee and Aaron Sidford. Solving linear programs with $\widetilde{\mathcal{O}}(\sqrt{\text{rank}})$ linear system solves. arXiv preprint arXiv:1910.08033, 2019.Yin Tat Lee and Santosh S Vempala. Geodesic walks in polytopes. In Symposium on theory of Computing (STOC), pages 927--940, 2017.Yin Tat Lee and Santosh S Vempala. Convergence rate of Riemannian Hamiltonian Monte Carlo and faster polytope volume computation. In Symposium on Theory of Computing (STOC), pages 1115--1121, 2018.Yin Tat Lee and Man-Chung Yue. Universal barrier is $n$-self-concordant. Mathematics of Operations Research, 46 (3): 1129--1148, 2021.Ruilin Li, Molei Tao, Santosh S Vempala, and Andre Wibisono. The mirror Langevin algorithm converges with vanishing bias. In International Conference on Algorithmic Learning Theory (ALT), pages 718--742. PMLR, 2022.László Lovász. Hit-and-run mixes fast. Mathematical programming, 86: 443--461, 1999.László Lovász and Miklós Simonovits. Random walks in a convex body and an improved volume algorithm. Random structures & algorithms (RS&A), 4 (4): 359--412, 1993.László Lovász and Santosh Vempala. Hit-and-run from a corner. SIAM Journal on Computing (SICOMP), 35 (4): 985--1005, 2006a.László Lovász and Santosh Vempala. Simulated annealing in convex bodies and an $\mathcal{O}^*(n^4)$ volume algorithm. Journal of Computer and System Sciences (JCSS), 72 (2): 392--417, 2006b.László Lovász and Santosh Vempala. The geometry of logconcave functions and sampling algorithms. Random Structures & Algorithms (RS&A), 30 (3): 307--358, 2007.Jan R Magnus and Heinz Neudecker. The elimination matrix: some lemmas and applications. SIAM Journal on Algebraic Discrete Methods (SADM), 1 (4): 422--449, 1980.Hariharan Narayanan. Randomized interior point methods for sampling and optimization. The Annals of Applied Probability, 26 (1): 597--641, 2016.Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2003.Yurii Nesterov and Arkadii Nemirovskii. Interior-point polynomial algorithms in convex programming. SIAM, 1994.Yurii Nesterov et al. Lectures on convex optimization, volume 137. Springer, 2018.Yurii E Nesterov, Michael J Todd, et al. On the Riemannian geometry defined by self-concordant barriers and interior-point methods. Foundations of Computational Mathematics, 2 (4): 333--361, 2002.R Tyrrell Rockafellar. Convex analysis, volume 11. Princeton university press, 1997.Sushant Sachdeva and Nisheeth K Vishnoi. The mixing time of the Dikin walk in a polytope: a simple proof. Operations Research Letters, 44 (5): 630--634, 2016.Robert L Smith. Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions. Operations Research, 32 (6): 1296--1308, 1984.Vishwak Srinivasan, Andre Wibisono, and Ashia Wilson. Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm. arXiv preprint arXiv:2312.08823, 2023.Pravin M Vaidya. A new algorithm for minimizing convex functions over convex sets. Mathematical programming, 73 (3): 291--341, 1996.Santosh Vempala. Geometric random walks: a survey. Combinatorial and computational geometry, 52 (573-612): 2, 2005.Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018. We collect algebraic identities related to trace, vectorization, Kronecker and Hadamard product. For $A,B,C,D\in\mathbb{R}^{d\times d}$ and $M$ in Definition \ref{['def:linearOperators']}, $(A\otimes B)\,\textup{vec}(C)=\textup{Tr}(BCA^{\mathsf{T}})$.$\textup{vec}(A)^{\mathsf{T}}(B\otimes C)\textup{vec}(D)=\textup{Tr}(DB^{\mathsf{T}}A^{\mathsf{T}}C)$.$(A\otimes B)(C\otimes D)=AC\otimes BD$.$(A\otimes B)^{-1}=A^{-1}\otimes B^{-1}$.$(A\otimes B)^{\mathsf{T}}=A^{\mathsf{T}}\otimes B^{\mathsf{T}}$.$\textup{Tr}(A\otimes B)=\textup{Tr}(A)\textup{Tr}(B)$.$\det{\bigl(M^{\mathsf{T}}(A\otimes A)M\bigr)}=2^{d(d-1)/2}(\det A)^{d+1}$. Let $A,B,C,D\in\mathbb{R}^{d\times d}$, $x,y\in\mathbb{R}^{d}$, and $D_{1},D_{2}\in\mathbb{R}^{d\times d}$ be diagonal matrices. $(A\circ B)y=\textsf{diag}(A\,\textup{Diag}(y)B^{\mathsf{T}})$.$x^{\mathsf{T}}(A\circ B)y=\textup{Tr}(\textup{Diag}(x)A\,\textup{Diag}(y)B^{\mathsf{T}})$.$D_{1}(A\circ B)=(D_{1}A)\circ B=A\circ(D_{1}B)$.$(A\circ B)D_{2}=(AD_{2})\circ B=A\circ(BD_{2})$.$(A\otimes B)\circ(C\otimes D)=(A\circ C)\otimes(B\circ D)$. Let $g(x):\mathbb{R}^{d}\to\mathbb{R}^{d\times d}$ be a matrix function. Its gradient at $x$, denoted by $\mathrm{D} g(x)$, is the third-order tensor defined by $(\mathrm{D} g(x))_{ijk}=\frac{\partial g_{ij}(x)}{\partial x_{k}}$. Unless specified otherwise, the multiplication between higher-order tensors and a matrix of size $d\times d$ is running over $(i,j)$-entries. For instance, for a matrix $M\in\mathbb{R}^{d\times d}$ the product $\mathrm{D} g(x)\cdot M$ indicates the third-order tensor defined by $(\mathrm{D} g(x)\,M)_{\cdot,\cdot,k}=(\mathrm{D} g(x))_{\cdot,\cdot,k}M\text{ for each }k\in[d]\,.$ In the same way, the trace is applied to a matrix spanned by $(i,j)$-entries, i.e., ${\bigl(\textup{Tr}(\mathrm{D} g(x))\bigr)}_{k}=\textup{Tr}{\Bigl({\bigl(\mathrm{D} g(x)\bigr)}_{\cdot,\cdot,k}\Bigr)}\,.$ For $\varphi:\mathbb{R}^{d}\to\mathbb{R}$ with $\varphi(\cdot):=\log\det g(\cdot)$, its gradient and the directional derivative in $h\in\mathbb{R}^{d}$ are $\nabla\varphi(x)=\textup{Tr}{\bigl(g(x)^{-1}\mathrm{D} g(x)\bigr)}\,,\qquad\text{and}\qquad\nabla\varphi(x)\cdot h=\textup{Tr}{\bigl(g(x)^{-1}\mathrm{D} g(x)[h]\bigr)}\,.$ For the Hessian of $\varphi$, using the product rule and $\mathrm{D}(g^{-1})(x)=-g(x)^{-1}\mathrm{D} g(x)\,g(x)^{-1}\,,$ we obtain \nabla^{2}\varphi(x)=\mathrm{D}\textup{Tr}{\bigl(g(x)^{-1}\mathrm{D} g(x)\bigr)}=-\textup{Tr}{\bigl(g(x)^{-1}\mathrm{D} g(x)\,g(x)^{-1}\mathrm{D} g(x)\bigr)}+\textup{Tr}{\bigl(g(x)^{-1}\mathrm{D}^{2}g(x)\bigr)}=\textup{Tr}{\bigl(g(x)^{-1}\mathrm{D}^{2}g(x)\bigr)}-{\|g(x)^{-\frac{1}{2}}\mathrm{D} g(x)\,g(x)^{-\frac{1}{2}}\|}_{F}^{2}\,, where $\mathrm{D}^{2}g(x)$ is the fourth-order tensor defined by $(\mathrm{D}^{2}g(x))_{ijkl}=\frac{\partial[g(x)]_{ij}}{\partial x_{k}\partial x_{l}}$. We now present formulas for the Hessian and its inverse of $\phi(\cdot)=-\log\det(\cdot)$ on $\mathbb{S}_{++}^{d}$. By setting $g(X)=X$ and $\phi(X)=-\varphi(X)$ above, \ref{['eq:hessLogDet']} implies that for a symmetric matrix $H\in\mathbb{S}^{d}$ \nabla^{2}\phi(X)[H,H]={\|X^{-\frac{1}{2}}HX^{-\frac{1}{2}}\|}_{F}^{2}=\textup{Tr}(X^{-1}HX^{-1}H)=\textup{vec}(H)^{\mathsf{T}}(X^{-1}\otimes X^{-1})\textup{vec}(H)=\textup{vec}(H)^{\mathsf{T}}(X\otimes X)^{-1}\textup{vec}(H)\,, where the last equality follows from Lemma \ref{['lem:Kronecker']}. When representing $X$ and $H$ in $\mathbb{R}^{d_{s}}$ space with notations $x:=\textup{svec}(X)$ and $h:=\textup{svec}(H)$, the definition of $M$ (see Definition \ref{['def:linearOperators']}) turns \ref{['eq:2ndDiffLogDet']} into $\nabla^{2}\phi(x)[h,h]=h^{\mathsf{T}}M^{\mathsf{T}}(X\otimes X)^{-1}Mh\,,$ and thus $g_{X}:=\nabla_{x}^{2}\phi(x)=\nabla_{X}^{2}\phi(X)$ equals $M^{\mathsf{T}}(X\otimes X)^{-1}M$. The formula for the inverse, $g_{X}^{-1}=M^{\dagger}(X\otimes X)(M^{\dagger})^{\mathsf{T}}$, is immediate from magnus1980elimination, and another part follows from $M^{\dagger}=LN$ and $N^{\mathsf{T}}=N$ magnus1980elimination. We collect details on self-concordant barriers for linear constraints, $P=\{x\in\mathbb{R}^{d}:Ax\geq b\}$ with $A\in\mathbb{R}^{m\times d}$ and $b\in\mathbb{R}^{m}$: the logarithmic, volumetric, and Lewis-weight barrier/metric. Recall the notations used in the paper: $s_{x}=\textsf{diag}(Ax-b)\in\mathbb{R}^{m}$, $S_{x}=\textup{Diag}(s_{x})\in\mathbb{R}^{m\times m}$, and $A_{x}=S_{x}^{-1}A\in\mathbb{R}^{m\times d}$. Also, $s_{x,h}=A_{x}h\in\mathbb{R}^{m}$ and $S_{x,h}=\textup{Diag}(s_{x,h})\in\mathbb{R}^{m\times m}$. Let $h\in\mathbb{R}^{d}$. For $x\in P$, the logarithmic barrier (or log-barrier) and the Hessian metric are given by $\phi_{\log}(x):=-\sum_{i=1}^{m}\log(a_{i}^{\mathsf{T}}x-b)\,,\qquad\text{and}\qquad g(x)=\nabla^{2}\phi(x)=A_{x}^{T}A_{x}\,.$ $\mathrm{D} S_{x}[h]=\textup{Diag}(Ah)$ and $\mathrm{D} S_{x}^{-1}[h]=-S_{x}^{-1}S_{x,h}$. Also, $\mathrm{D} g(x)[h]=-2A_{x}^{\mathsf{T}}S_{x,h}A_{x}$ and $\mathrm{D}^{2}g(x)[h,h]=6A_{x}^{\mathsf{T}}S_{x,h}^{2}A_{x}\succeq0$. The first is obvious from differentiation of $S_{x}=\textup{Diag}(Ax-b)$ w.r.t. $x$. As for the second, \mathrm{D} S_{x}^{-1}[h]=-S_{x}^{-1}\mathrm{D} S_{x}[h]\,S_{x}^{-1}=-S_{x}^{-1}\textup{Diag}(Ah)S_{x}^{-1}=-S_{x}^{-1}\textup{Diag}(A_{x}h)=-S_{x}^{-1}S_{x,h}\,. As for the third and fourth, as $g(x)=A^{\mathsf{T}}S_{x}^{-2}A$, \mathrm{D} g(x)[h]=A^{\mathsf{T}}\mathrm{D} S_{x}^{-2}[h]\,A=-2A^{\mathsf{T}}S_{x}^{-3}\mathrm{D} S_{x}[h]A=-2A_{x}^{\mathsf{T}}S_{x}^{-1}\textup{Diag}(Ah)A_{x}=-2A_{x}^{\mathsf{T}}S_{x,h}A_{x}\,.\mathrm{D}^{2}g(x)[h,h]=-2A^{\mathsf{T}}\mathrm{D} S_{x}^{-3}[h]\,\textup{Diag}(Ah)A=6A^{\mathsf{T}}S_{x}^{-4}\mathrm{D} S_{x}[h]\,\textup{Diag}(Ah)A=6A_{x}^{\mathsf{T}}S_{x,h}^{2}A_{x}\,.\qedhere vaidya1996new introduced the volumetric barrier for $P$, defined by $\phi_{\textrm{vol}}(x)=\frac{1}{2}\,\log\det{\bigl(\nabla^{2}\phi_{\log}(x)\bigr)}=\frac{1}{2}\,\log\det(A_{x}^{\mathsf{T}}A_{x})\,.$ $\nabla\phi_{\textrm{vol}}(x)=-A_{x}^{\mathsf{T}}\sigma_{x}$ and $\nabla^{2}\phi_{\textrm{vol}}(x)=A_{x}^{\mathsf{T}}(3\Sigma_{x}-2P_{x}^{(2)})A_{x}$. For $P_{x}:=P(A_{x})$, using \ref{['eq:gradLogDet']} with Claim \ref{['claim:diffLogBarrier']} and apply Lemma \ref{['lem:Hadamard']} in (i), \nabla\phi_{\textrm{vol}}(x)[h]=-\textup{Tr}{\bigl((A_{x}^{\mathsf{T}}A_{x})^{-1}A_{x}^{\mathsf{T}}S_{x,h}A_{x}\bigr)}=-\textup{Tr}(P_{x}S_{x,h})\underset{\text{(i)}}{=}-1^{\mathsf{T}}(P_{x}\circ I_{m})s_{x,h}=-h^{\mathsf{T}}A_{x}^{\mathsf{T}}\sigma_{x}\,, For the Hessian of $\phi_{\textrm{vol}}$, let $g(x)=A_{x}^{\mathsf{T}}A_{x}$ and then by \ref{['eq:hessLogDet']}, $\nabla^{2}\phi_{\textrm{vol}}(x)[h,h]=\frac{1}{2}\,{\bigl(\textup{Tr}(g^{-1}\mathrm{D}^{2}g[h,h])-\textup{Tr}(g^{-1}\mathrm{D} g[h]\,g^{-1}\mathrm{D} g[h])\bigr)}\,.$ As for the first term, Claim \ref{['claim:diffLogBarrier']} leads to \textup{Tr}(g^{-1}\mathrm{D}^{2}g[h,h])=6\textup{Tr}(g^{-1}A_{x}^{\mathsf{T}}S_{x,h}^{2}A_{x})=6\textup{Tr}(P_{x}S_{x,h}IS_{x,h})=6h^{\mathsf{T}}A_{x}^{\mathsf{T}}(P_{x}\circ I)A_{x}h=6h^{\mathsf{T}}A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}h\,. As for the second term, \textup{Tr}(g^{-1}\mathrm{D} g[h]\,g^{-1}\mathrm{D} g[h])=4\textup{Tr}(P_{x}S_{x,h}P_{x}S_{x,h})=4(A_{x}h)^{\mathsf{T}}(P_{x}\circ P_{x})(A_{x}h)=4h^{\mathsf{T}}A_{x}^{\mathsf{T}}P_{x}^{(2)}A_{x}h\,. Hence, $\mathrm{D}^{2}\phi_{\textrm{vol}}(x)[h,h]=h^{\mathsf{T}}A_{x}^{\mathsf{T}}(3\Sigma_{x}-2P_{x}^{(2)})A_{x}h$, which completes the proof. $P_{x}^{(2)}\preceq\Sigma_{x}$, so $A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}\preceq\nabla^{2}\phi_{\textrm{vol}}(x)\preceq3A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}$. Due to $\Sigma_{x}=P_{x}\circ I$, it suffices to show $h^{\mathsf{T}}P_{x}\circ(I-P_{x})\,h\geq0$ for any $h\in\mathbb{R}^{d}$. Since $P_{x}$ and $I-P_{x}$ are orthogonal projections, for $H=\textup{Diag}(h)$ and $C:=P_{x}H(I-P_{x})$ , h^{\mathsf{T}}P_{x}\circ(I-P_{x})\,h=\textup{Tr}{\bigl(HP_{x}H(I-P_{x})\bigr)}=\textup{Tr}{\bigl((I-P_{x})HP_{x}P_{x}H(I-P_{x})\bigr)}=\textup{Tr}(C^{\mathsf{T}}C)\geq0\,.\qedhere We derive formulas for derivatives of leverage scores, orthogonal projections, and so on. For $x,h\in\mathbb{R}^{d}$, let $P_{x}=A_{x}(A_{x}^{\mathsf{T}}A_{x})^{-1}A_{x}^{\mathsf{T}}$, $\Sigma_{x}=\textup{Diag}(P_{x})$, and $\Lambda_{x}=\Sigma_{x}-P_{x}^{(2)}$. Denote $\theta(x):=A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}$. lee2019solving $\Sigma_{x,h}'=-2\textup{Diag}(\Lambda_{x}s_{x,h})=2{\bigl(\textup{Diag}(P_{x}S_{x,h}P_{x})-\Sigma_{x}S_{x,h}\bigr)}$.lee2019solving $P_{x,h}'=-P_{x}S_{x,h}-S_{x,h}P_{x}+2P_{x}S_{x,h}P_{x}$.$\Lambda_{x,h}'=-2\textup{Diag}(\Lambda_{x}s_{x,h})+2P_{x}\circ P_{x}S_{x,h}+2S_{x,h}P_{x}\circ P_{x}-2(P_{x}S_{x,h}P_{x})\circ P_{x}-2P_{x}\circ(P_{x}S_{x,h}P_{x})$.$\Sigma_{x,h}"=6S_{x,h}\Sigma_{x}S_{x,h}+8\textup{Diag}(P_{x}S_{x,h}P_{x}S_{x,h}P_{x})-6\textup{Diag}(P_{x}S_{x,h}^{2}P_{x})-8\textup{Diag}(S_{x,h}P_{x}S_{x,h}P_{x})$.$\mathrm{D}\theta(x)[h]=-2A_{x}^{\mathsf{T}}\Sigma_{x}S_{x,h}A_{x}+A_{x}^{\mathsf{T}}\Sigma_{x,h}'A_{x}$.$\mathrm{D}^{2}\theta(x)[h,h]=6A_{x}^{\mathsf{T}}S_{x,h}\Sigma_{x}S_{x,h}A_{x}-4A_{x}^{\mathsf{T}}\Sigma_{x,h}'S_{x,h}A_{x}+A_{x}^{\mathsf{T}}\Sigma_{x,h}"A_{x}$. Equivalently, \mathrm{D}^{2}\theta(x)[h,h]=20A_{x}^{\mathsf{T}}S_{x,h}\Sigma_{x}S_{x,h}A_{x}-16A_{x}^{\mathsf{T}}\textup{Diag}(S_{x,h}P_{x}S_{x,h}P_{x})A_{x}\qquad-6A_{x}^{\mathsf{T}}\textup{Diag}(P_{x}S_{x,h}^{2}P_{x})A_{x}+8A_{x}^{\mathsf{T}}\textup{Diag}(P_{x}S_{x,h}P_{x}S_{x,h}P_{x})A_{x}. As for the third item, \Lambda_{x,h}'=\Sigma_{x,h}'-P_{x,h}'\circ P_{x}-P_{x}\circ P_{x,h}'=-2\textup{Diag}(\Lambda_{x}s_{x,h})-(-P_{x}S_{x,h}-S_{x,h}P_{x}+2P_{x}S_{x,h}P_{x})\circ P_{x}-P_{x}\circ(-P_{x}S_{x,h}-S_{x,h}P_{x}+2P_{x}S_{x,h}P_{x})\underset{\text{(i)}}{=}-2\textup{Diag}(\Lambda_{x}s_{x,h})+2P_{x}\circ P_{x}S_{x,h}+2S_{x,h}P_{x}\circ P_{x}-2(P_{x}S_{x,h}P_{x})\circ P_{x}-2P_{x}\circ(P_{x}S_{x,h}P_{x})\,, where in (i) we used $D(A\circ B)=(DA)\circ B=A\circ(DB)$ and $(A\circ B)D=(AD)\circ B=A\circ(BD)$ for a diagonal matrix $D\in\mathbb{R}^{d\times d}$ (Lemma \ref{['lem:Hadamard']}). As for the fourth item, \Sigma_{x,h}"=-2\mathrm{D}{\bigl(\textup{Diag}(\Lambda_{x}s_{x,h})\bigr)}[h]=-2\textup{Diag}(\Lambda_{x,h}'s_{x,h})+2\textup{Diag}(\Lambda_{x}S_{x,h}s_{x,h})=-2\textup{Diag}{\bigl({\bigl[-2\textup{Diag}(\Lambda_{x}s_{x,h})+2P_{x}\circ P_{x}S_{x,h}+2S_{x,h}P_{x}\circ P_{x}-2(P_{x}S_{x,h}P_{x})\circ P_{x}-2P_{x}\circ(P_{x}S_{x,h}P_{x})\bigr]}s_{x,h}\bigr)}\qquad+2\textup{Diag}(\Lambda_{x}S_{x,h}s_{x,h})=4\textup{Diag}(\textcolor{red}{\Lambda_{x}}s_{x,h})\textcolor{blue}{S_{x,h}}-4\textup{Diag}(P_{x}\circ P_{x}S_{x,h}s_{x,h})-4\textup{Diag}(S_{x,h}P_{x}\circ P_{x}s_{x,h})\qquad+4\textup{Diag}{\bigl((P_{x}S_{x,h}P_{x})\circ P_{x}s_{x,h}\bigr)}+4\textup{Diag}{\bigl(P_{x}\circ(P_{x}S_{x,h}P_{x})s_{x,h}\bigr)}+2\textup{Diag}(\textcolor{red}{\Lambda_{x}}S_{x,h}s_{x,h})=4\textup{Diag}{\bigl(\textcolor{blue}{S_{x,h}}\textcolor{red}{(\Sigma_{x}-P_{x}\circ P_{x})}s_{x,h}\bigr)}-4\textup{Diag}(P_{x}\circ P_{x}S_{x,h}s_{x,h})-4\textup{Diag}(S_{x,h}P_{x}\circ P_{x}s_{x,h})\qquad+4\textup{Diag}{\bigl((P_{x}S_{x,h}P_{x})\circ P_{x}s_{x,h}\bigr)}+4\textup{Diag}{\bigl(P_{x}\circ(P_{x}S_{x,h}P_{x})s_{x,h}\bigr)}+2\textup{Diag}{\bigl(\textcolor{red}{(\Sigma_{x}-P_{x}\circ P_{x})}S_{x,h}s_{x,h}\bigr)}=\textcolor{cyan}{4Diag(S_{x,h}\Sigma_{x}s_{x,h})}-6\textup{Diag}(P_{x}\circ P_{x}S_{x,h}s_{x,h})-8\textup{Diag}(S_{x,h}P_{x}\circ P_{x}s_{x,h})\qquad+4\textup{Diag}{\bigl((P_{x}S_{x,h}P_{x})\circ P_{x}s_{x,h}\bigr)}+4\textup{Diag}{\bigl(P_{x}\circ(P_{x}S_{x,h}P_{x})s_{x,h}\bigr)}+\textcolor{cyan}{2Diag(\Sigma_{x}S_{x,h}s_{x,h})}=\text{$\textcolor{cyan}{6Diag(S_{x,h}\Sigma_{x}s_{x,h})}$}-6\textup{Diag}(\textcolor{blue}{P_{x}\circ P_{x}S_{x,h}s_{x,h}})-8\textup{Diag}(\textcolor{blue}{S_{x,h}P_{x}\circ P_{x}s_{x,h}})\qquad+4\textup{Diag}{\bigl(\textcolor{blue}{(P_{x}S_{x,h}P_{x})\circ P_{x}s_{x,h}}\bigr)}+4\textup{Diag}{\bigl(\textcolor{blue}{P_{x}\circ(P_{x}S_{x,h}P_{x})s_{x,h}}\bigr)}\underset{\text{(i)}}{=}6S_{x,h}\Sigma_{x}\textup{Diag}(s_{x,h})-6\textup{Diag}{\Bigl(\textsf{diag}{\bigl(P_{x}S_{x,h}(P_{x}S_{x,h})^{\mathsf{T}}\bigr)}\Bigr)}-8\textup{Diag}{\Bigl(\textsf{diag}(S_{x,h}P_{x}S_{x,h}P_{x}^{\mathsf{T}})\Bigr)}\qquad+4\textup{Diag}(P_{x}S_{x,h}P_{x}S_{x,h}P_{x})+4\textup{Diag}{\bigl(P_{x}S_{x,h}(P_{x}S_{x,h}P_{x})^{\mathsf{T}}\bigr)}=6S_{x,h}\Sigma_{x}S_{x,h}-6\textup{Diag}(P_{x}S_{x,h}^{2}P_{x})-8\textup{Diag}(S_{x,h}P_{x}S_{x,h}P_{x})+8\textup{Diag}(P_{x}S_{x,h}P_{x}S_{x,h}P_{x})\,, where in (i) we applied Lemma \ref{['lem:Hadamard']}-1 to the terms with blue. Applying the product rule to $\theta(x)=A_{x}^{\mathsf{T}}\Sigma_{x}A_{x}=A^{\mathsf{T}}S_{x}^{-2}\Sigma_{x}A,$ \mathrm{D}\theta[h]=-2A^{\mathsf{T}}S_{x}^{-3}\Sigma_{x}\textup{Diag}(Ah)A+A^{\mathsf{T}}S_{x}^{-2}\Sigma_{x,h}'A=-2A_{x}^{\mathsf{T}}\Sigma_{x}S_{x,h}A_{x}+A_{x}^{\mathsf{T}}\Sigma_{x,h}'A_{x}\,,\mathrm{D}^{2}\theta[h,h]=6A_{x}^{\mathsf{T}}S_{x,h}\Sigma_{x}S_{x,h}A_{x}-2A_{x}^{\mathsf{T}}\Sigma_{x,h}'S_{x,h}A_{x}-2A_{x}^{\mathsf{T}}S_{x,h}\Sigma_{x,h}'A_{x}+A_{x}^{\mathsf{T}}\Sigma_{x,h}"A_{x}=6A_{x}^{\mathsf{T}}S_{x,h}\Sigma_{x}S_{x,h}A_{x}-4A_{x}^{\mathsf{T}}\Sigma_{x,h}'S_{x,h}A_{x}+A_{x}^{\mathsf{T}}\Sigma_{x,h}"A_{x}\,. By substituting $\Sigma_{x,h}'$ and $\Sigma_{x,h}"$ with our formulas above, \mathrm{D}^{2}\theta[h,h]=6A_{x}^{\mathsf{T}}S_{x,h}\Sigma_{x}S_{x,h}A_{x}-4A_{x}^{\mathsf{T}}\Sigma_{x,h}'S_{x,h}A_{x}+A_{x}^{\mathsf{T}}\Sigma_{x,h}"A_{x}=6A_{x}^{\mathsf{T}}S_{x,h}\Sigma_{x}S_{x,h}A_{x}+8A_{x}^{\mathsf{T}}{\bigl(\Sigma_{x}S_{x,h}-\textup{Diag}(P_{x}S_{x,h}P_{x})\bigr)}S_{x,h}A_{x}\qquad+A_{x}^{\mathsf{T}}{\Bigl(6S_{x,h}\Sigma_{x}S_{x,h}-6\textup{Diag}(P_{x}S_{x,h}^{2}P_{x})-8\textup{Diag}(S_{x,h}P_{x}S_{x,h}P_{x})+8\textup{Diag}(P_{x}S_{x,h}P_{x}S_{x,h}P_{x})\Bigr)}A_{x}=20A_{x}^{\mathsf{T}}S_{x,h}\Sigma_{x}S_{x,h}A_{x}-16A_{x}^{\mathsf{T}}\textup{Diag}(S_{x,h}P_{x}S_{x,h}P_{x})A_{x}-6A_{x}^{\mathsf{T}}\textup{Diag}(P_{x}S_{x,h}^{2}P_{x})A_{x}\qquad+8A_{x}^{\mathsf{T}}\textup{Diag}(P_{x}S_{x,h}P_{x}S_{x,h}P_{x})A_{x}\,.\qedhere We recall preliminaries on the Lewis weights. Particularly, the leverage scores are simply the $\ell_{2}$-Lewis weights. Let $W_{x}=\textup{Diag}(w_{x}(A_{x}))\in\mathbb{S}_{++}^{d}$ be the $\ell_{p}$-Lewis weights and $g(x)=A_{x}^{\mathsf{T}}W_{x}A_{x}$ the Lewis-weights metric, and $h\in\mathbb{R}^{d}$. (Lemma 26) $\max_{i\in[m]}\frac{[\sigma(W_{x}^{1/2}A_{x})]_{i}}{(w_{x})_{i}}\leq2m^{\frac{2}{p+2}}$.(Lemma 33) ${\|A_{x}h\|}_{W_{x}}={\|h\|}_{g(x)}$ and ${\|A_{x}h\|}_{\infty}\leq\sqrt{2}m^{\frac{1}{p+2}}{\|h\|}_{g(x)}$.(Lemma 34) ${\|W_{x}^{-1}w_{x,h}'\|}_{W_{x}}\leq p\,{\|h\|}_{g(x)}$. Next is a directional derivative of the $\ell_{p}$-Lewis weight of $A_{x}$. The directional derivative of the $\ell_{p}$-Lewis weight $W_{x}$ in direction $h\in\mathbb{R}^{d}$ is $W_{x,h}':=\mathrm{D} W_{x}[h]=-2\,\textup{Diag}(\Lambda_{x}G_{x}^{-1}W_{x}s_{x,h})=-\textup{Diag}(W_{x}^{\frac{1}{2}}N_{x}W_{x}^{\frac{1}{2}}s_{x,h})\,,$ where $\Lambda_{x}\stackrel{\mathrm{{ def}}}{=} W_{x}-P_{x}^{(2)}$, $\bar{\Lambda}_{x}\stackrel{\mathrm{{ def}}}{=} W_{x}^{-\frac{1}{2}}\Lambda_{x}W_{x}^{-\frac{1}{2}}$, $G_{x}\stackrel{\mathrm{{ def}}}{=} W_{x}-{\bigl(1-\frac{2}{p}\bigr)}\Lambda_{x}$, and $N_{x}\stackrel{\mathrm{{ def}}}{=}2\bar{\Lambda}_{x}(I-c_{p}\bar{\Lambda}_{x})^{-1}$. It is known that these matrices satisfy P_{x}^{(2)}\preceq W_{x}\preceq I\,,\Lambda_{x}\preceq W_{x}\,,\frac{2}{p}W_{x}\preceq G_{x}\preceq W_{x}\,,\text{ which implies }W_{x}^{-1}\preceq G_{x}^{-1}\preceq\frac{p}{2}W_{x}^{-1}\text{ and }I\preceq W_{x}^{\frac{1}{2}}G_{x}^{-1}W_{x}^{\frac{1}{2}}\preceq\frac{p}{2}I\,. We can also compute the second-order directional derivative of $W_{x}$ in direction $h\in\mathbb{R}^{d}$. Let $w_{x}\in\mathbb{R}^{m}$ be the $\ell_{p}$-Lewis weight, $\Gamma\in\mathbb{R}_{\geq0}^{m\times m}$ a diagonal matrix, and $h\in\mathbb{R}^{d}$. Then, W_{x,h}"=-\textup{Diag}{\bigl(\frac{1}{2} W_{x}^{-\frac{1}{2}}W_{x,h}'N_{x}W_{x}^{\frac{1}{2}}s_{x,h}+W_{x}^{\frac{1}{2}}N_{x,h}'W_{x}^{\frac{1}{2}}s_{x,h}+\frac{1}{2} W_{x}^{\frac{1}{2}}N_{x}W_{x}^{-\frac{1}{2}}W_{x,h}'s_{x,h}+2\Lambda_{x}G_{x}^{-1}W_{x}s_{x,h}^{2}\bigr)}\,,\textup{Tr}(\Gamma W_{x,h}")=-\frac{1}{2}\,\textup{Tr}{\bigl(\Gamma\,\textup{Diag}(\underbrace{W_{x}^{-\frac{1}{2}}W_{x,h}'N_{x}W_{x}^{\frac{1}{2}}s_{x,h}}_{\textup{I}})\bigr)}-\textup{Tr}{\bigl(\Gamma\,\textup{Diag}(\underbrace{W_{x}^{\frac{1}{2}}N_{x,h}'W_{x}^{\frac{1}{2}}s_{x,h}}_{\textup{II}})\bigr)}\qquad\qquad\qquad-\frac{1}{2}\,\textup{Tr}{\bigl(\Gamma\,\textup{Diag}(\underbrace{W_{x}^{\frac{1}{2}}N_{x}W_{x}^{-\frac{1}{2}}W_{x,h}'s_{x,h}}_{\textup{III}})\bigr)}-2\textup{Tr}{\bigl(\Gamma\,\textup{Diag}(\underbrace{\Lambda_{x}G_{x}^{-1}W_{x}S_{x,h}s_{x,h}}_{\textup{IV}})\bigr)}\,,\mathrm{D}^{2}(A_{x}^{\mathsf{T}}W_{x}A_{x})[h,h]=6A_{x}^{\mathsf{T}}S_{x,h}W_{x}S_{x,h}A_{x}-4A_{x}^{\mathsf{T}}W_{x,h}'S_{x,h}A_{x}+A_{x}^{\mathsf{T}}W_{x,h}"A_{x} where ${\|\textup{I}\|}_{W_{x}^{-1}}\lesssim p^{3}m^{\frac{1}{p+2}}{\|h\|}_{\theta}^{2}$, ${\|\textup{II}\|}_{W_{x}^{-1}}\lesssim p^{3.5}{\|h\|}_{\theta}^{2}$, ${\|\textup{III}\|}_{W_{x}^{-1}}\lesssim p^{3}m^{\frac{1}{p+2}}\,{\|h\|}_{\theta}^{2}$, and ${\|\textup{IV}\|}_{W_{x}^{-1}}\lesssim pm^{\frac{1}{p+2}}{\|h\|}_{\theta}^{2}$. Here, $\lesssim$ hides universal constants and poly-logarithmic factors in $m$. The formula for $W_{x,h}"$ follows from differentiating the formula for $W_{x,h}'$ (Lemma \ref{['lem:DWh']}). The dual local norms of I IV can be bounded as follows: {\|\text{I}\|}_{W_{x}^{-1}}={\|W_{x}^{-1}W_{x,h}'N_{x}W_{x}^{\frac{1}{2}}s_{x,h}\|}_{2}\leq\underbrace{{\|W_{x}^{-1}W_{x,h}'\|}_{2}}_{\text{Lemma }\ref{['lem:LS-comp-tool']}\text{-2}}\underbrace{{\|N_{x}\|}_{2}}_{\text{Lemma }\ref{['lem:LS-comp-tool']}\text{-1}}{\|W_{x}^{\frac{1}{2}}s_{x,h}\|}_{2}\lesssim p^{3}m^{\frac{1}{p+2}}{\|h\|}_{\theta}^{2}\,,{\|\text{II}\|}_{W_{x}^{-1}}={\|N_{x,h}'W_{x}^{\frac{1}{2}}s_{x,h}\|}_{2}\leq\underbrace{{\|I+N_{x}\|}_{2}}_{\text{Lemma }\ref{['lem:LS-comp-tool']}\text{-1}}\underbrace{{\|(I+N_{x})^{-\frac{1}{2}}N_{x,h}'(I+N_{x})^{-\frac{1}{2}}\|}_{2}}_{\text{Lemma }\ref{['lem:LS-comp-tool']}\text{-3}}{\|W_{x}^{\frac{1}{2}}s_{x,h}\|}_{2}\lesssim p^{3.5}{\|h\|}_{\theta}^{2}\,,{\|\text{III}\|}_{W_{x}^{-1}}={\|N_{x}W_{x}^{-\frac{1}{2}}W_{x,h}'s_{x,h}\|}_{2}\leq\underbrace{{\|N_{x}\|}_{2}}_{\text{Lemma }\ref{['lem:LS-comp-tool']}\text{-1}}\underbrace{{\|W_{x}^{-1}W_{x,h}'\|}_{2}}_{\text{Lemma }\ref{['lem:LS-comp-tool']}\text{-2}}{\|W_{x}s_{x,h}\|}_{2}\lesssim p^{3}m^{\frac{1}{p+2}}\,{\|h\|}_{\theta}^{2}\,,{\|\text{IV}\|}_{W_{x}^{-1}}^{2}=s_{x,h}^{\mathsf{T}}S_{x,h}W_{x}G_{x}^{-1}\underbrace{\Lambda_{x}W_{x}^{-1}\Lambda_{x}}_{\preceq W_{x}\ \ref{['eq:lewisBasic-LW']}}G_{x}^{-1}W_{x}S_{x,h}s_{x,h}\leq s_{x,h}^{\mathsf{T}}S_{x,h}W_{x}\underbrace{G_{x}^{-1}W_{x}G_{x}^{-1}}_{\preceq\frac{p^{2}}{4}W_{x}^{-1}\ \ref{['eq:lewisBasic-WGW']}}W_{x}S_{x,h}s_{x,h}\leq p^{2}s_{x,h}^{\mathsf{T}}W_{x}^{\frac{1}{2}}S_{x,h}^{2}W_{x}^{\frac{1}{2}}s_{x,h}\leq p^{2}{\|s_{x,h}\|}_{\infty}^{2}{\|h\|}_{\theta}^{2}\leq p^{2}m^{\frac{2}{p+2}}{\|h\|}_{\theta}^{4}\,, where we used Lemma \ref{['lem:usefulFactLewis']}-2 in the last inequality. Next, we recall bounds on the derivatives of matrices relevant to Lewis weights. Let $Ax\geq b$ and $h\in\mathbb{R}^{d}$. For $c_{p}=1-2/p$ with $p>2$, let $\bar{\Lambda}_{x}:=W_{x}^{-\frac{1}{2}}\Lambda_{x}W_{x}^{-\frac{1}{2}}=I-W_{x}^{-\frac{1}{2}}P_{x}^{(2)}W_{x}^{-\frac{1}{2}}$, $N_{x}\stackrel{\mathrm{{ def}}}{=}2\bar{\Lambda}_{x}(I-c_{p}\bar{\Lambda}_{x})^{-1}$ and $\theta_{x}=A_{x}^{\mathsf{T}}W_{x}A_{x}$. (Lemma 31) $N_{x}$ is symmetric and $0\preceq N_{x}\preceq pI$.(Lemma 34) ${\|W_{x}^{-1}w_{x,h}\|}_{\infty}\leq p(\sqrt{2}m^{\frac{1}{p+2}}+p/2)\,{\|h\|}_{\theta_{x}}$.(Lemma 37) ${\|(I+N_{x})^{-\frac{1}{2}}\mathrm{D} N_{x}[h]\,(I+N_{x})^{-\frac{1}{2}}\|}_{2}\leq4p^{5/2}{\|h\|}_{\theta_{x}}$. Lastly, we remind a result about closeness of the Lewis weights at close-by points. In the same setting above, let $x_{t}=x+th$, $s_{t}=s_{x_{t}}$, $w_{t}=w_{x_{t}}$, and $z_{t,\alpha}\in\mathbb{R}^{m}$ be a vector defined by $[z_{t,\alpha}]_{i}:=\frac{\mathrm{d}}{\mathrm{d} t}\log{\Bigl(\frac{[w_{t,i}]^{\alpha}}{s_{t,i}}\Bigr)}$. Then, ${\|z_{t}\|}_{\infty}\leq{\bigl(\sqrt{2}(1+|\alpha|p)m^{\frac{1}{p+2}}+p\,|\alpha|\,\max(1,p/2)\bigr)}\,{\|h\|}_{A_{t}^{\mathsf{T}}W_{t}A_{t}}\,.$ Now we present an auxiliary result showing HSC of the Lewis-weight metric. The metric $g(x)=cA_{x}^{\mathsf{T}}W_{x}A_{x}$ is HSC for $c=c_{1}(\log m)^{c_{2}}d^{1/2}$ with some constants $c_{1},c_{2}>0$, Let $\theta(x)=A_{x}^{\mathsf{T}}W_{x}A_{x}$ and $h\in\mathbb{R}^{d}$. From \ref{['eq:LW-second-derv']}, \mathrm{D}^{2}\theta[h,h,h,h]=6s_{x,h}^{\mathsf{T}}S_{x,h}W_{x}S_{x,h}s_{x,h}-4s_{x,h}^{\mathsf{T}}W_{x,h}'S_{x,h}s_{x,h}+s_{x,h}^{\mathsf{T}}W_{x,h}"s_{x,h}=\textup{Tr}(6S_{x,h}^{4}W_{x}-4S_{x,h}^{3}W_{x,h}'+S_{x,h}^{2}W_{x,h}")\,. As for the first term, $|\textup{Tr}(S_{x,h}^{4}W_{x})|\leq{\|s_{x,h}\|}_{\infty}^{2}{\|h\|}_{\theta}^{2}$. As for the second term, |\textup{Tr}(S_{x,h}^{3}W_{x,h}')|\leq{\|s_{x,h}\|}_{\infty}^{2}\textup{Tr}{\bigl(\sqrt{S_{x,h}W_{x,h}'^{2}S_{x,h}}\bigr)}={\|s_{x,h}\|}_{\infty}^{2}\textup{Tr}{\bigl(\sqrt{W_{x,h}'W_{x}^{-1}W_{x,h}'}\sqrt{S_{x,h}W_{x}S_{x,h}}\bigr)}\underset{\text{(i)}}{\leq}{\|s_{x,h}\|}_{\infty}^{2}\sqrt{\textup{Tr}(W_{x,h}'W_{x}^{-1}W_{x,h}')}\sqrt{\textup{Tr}(S_{x,h}W_{x}S_{x,h})}={\|s_{x,h}\|}_{\infty}^{2}{\|W_{x}^{-1}w_{x,h}'\|}_{W_{x}}{\|h\|}_{\theta}\underset{\text{(ii)}}{\leq}p{\|s_{x,h}\|}_{\infty}^{2}{\|h\|}_{\theta}^{2} where we used the Cauchy-Schwarz in (i) and Lemma \ref{['lem:usefulFactLewis']}-3 in (ii). As for the last term, we first use the formula for $\textup{Tr}(S_{x,h}^{2}W_{x,h}")$ with $\Gamma=S_{x,h}^{2}$ in Lemma \ref{['lem:second-deriv-Lewis']}. Each term there is of the form $\textup{Tr}(S_{x,h}^{2}\textup{Diag}(v))$ for $v=\,$I IV, which can be bounded as follows: |\textup{Tr}{\bigl(S_{x,h}^{2}\textup{Diag}(v)\bigr)}|=|\textup{Tr}{\bigl(S_{x,h}^{2}W_{x}^{\frac{1}{2}}W_{x}^{-\frac{1}{2}}\textup{Diag}(v)\bigr)}|\leq\sqrt{\textup{Tr}(W_{x}^{\frac{1}{2}}S_{x,h}^{4}W_{x}^{\frac{1}{2}})}\sqrt{\textup{Tr}{\bigl(\textup{Diag}(v)W_{x}^{-1}\textup{Diag}(v)\bigr)}}\leq{\|s_{x,h}\|}_{\infty}\,{\|h\|}_{\theta}\,{\|v\|}_{W_{x}^{-1}}\,. Using the norm bounds in Lemma \ref{['lem:second-deriv-Lewis']}, it follows that $|\textup{Tr}(S_{x,h}^{2}W_{x,h}")|\lesssim{\|h\|}_{\theta}^{4}$ for $p=\mathcal{O}(\log m)$. Putting everything together with ${\|s_{x,h}\|}_{\infty}\leq\sqrt{2}m^{\frac{1}{p+2}}{\|h\|}_{\theta}\lesssim{\|h\|}_{\theta}$ (Lemma \ref{['lem:usefulFactLewis']}-2), |\mathrm{D}^{2}\theta[h,h,h,h]|\lesssim{\|s_{x,h}\|}_{\infty}^{2}{\|h\|}_{\theta}^{2}+{\|s_{x,h}\|}_{\infty}{\|h\|}_{\theta}^{3}\lesssim{\|h\|}_{\theta}^{4}\,.\qedhere For a matrix $M\in\mathbb{R}^{m\times d}$ and $E\in\mathbb{R}^{d\times d}$ such that $E+M^{\mathsf{T}}M\succ0$, it holds that $M(E+M^{\mathsf{T}}M)^{-1}M^{\mathsf{T}}\preceq P(M)=M(M^{\mathsf{T}}M)^{\dagger}M^{\mathsf{T}}\,.$ Let us denote the LHS by $P'$ and the RHS by $P$. We show $I-P'\succeq I-P$ instead. First, $(P')^{2}\preceq P'$ and $(I-P')^{2}\preceq I-P'$ follow from P'P'=M(E+M^{\mathsf{T}}M)^{-1}\underbrace{M^{\mathsf{T}}M}_{\preceq E+M^{\mathsf{T}}M}\,(E+M^{\mathsf{T}}M)^{-1}M^{\mathsf{T}}\preceq M(E+M^{\mathsf{T}}M)^{-1}M^{\mathsf{T}}=P'\,,(I-P')^{2}=I+P'P'-2P'\preceq I-P'\,. It follows from $(I-P')^{2}\preceq I-P'$ that for any $v\in\mathbb{R}^{m}$ v^{\mathsf{T}}(I-P')v\geq{\|(I-P')v\|}^{2}\geq{\|(I-P)v\|}_{2}^{2}=v^{\mathsf{T}}(I-P)v\,, where the inequality holds due to $P'v,Pv\in\text{range}(M)$ and $Pv=\arg\min_{w\in\,\text{range}(M)}{\|v-w\|}_{2}^{2}$. Let $v,w,p,q,r,s\in\mathbb{R}^{d}$ and $h\sim\mathcal{N}(0,I_{d})$. $\mathbb{E}[(v\cdot h)(w\cdot h)^{3}]=3{\|w\|}^{2}(v\cdot w)$.$\mathbb{E}[(v\cdot h)^{2}(w\cdot h)^{2}]={\|v\|}^{2}{\|w\|}^{2}+2(v\cdot w)^{2}$.$\mathbb{E}[(p\cdot h)^{2}(r\cdot h)(s\cdot h)]={\|p\|}^{2}(r\cdot s)+2(p\cdot s)(p\cdot r)$. Using Stein's lemma (Lemma \ref{['lem:stein']}), \mathbb{E}[(v\cdot h)(w\cdot h)^{3}]\underset{\text{Stein}}{=}\sum_{i}w_{i}\mathbb{E}[h_{i}(v\cdot h)(w\cdot h)^{2}]=\sum_{i}w_{i}{\bigl(v_{i}\mathbb{E}[(w\cdot h)^{2}]+2w_{i}\mathbb{E}[(v\cdot h)(w\cdot h)]\bigr)}=(v\cdot w){\|w\|}^{2}+2{\|w\|}^{2}(v\cdot w)=3{\|w\|}^{2}(v\cdot w)\,,\mathbb{E}[(v\cdot h)^{2}(w\cdot h)^{2}]=\sum_{i}v_{i}\mathbb{E}[h_{i}(v\cdot h)(w\cdot h)^{2}]\underset{\text{Stein}}{=}\sum_{i}v_{i}\left(v_{i}\mathbb{E}[(w\cdot h)^{2}]+2w_{i}\mathbb{E}[(v\cdot h)(w\cdot h)]\right)={\|v\|}^{2}{\|w\|}^{2}+2(v\cdot w)^{2}\,,\mathbb{E}[(p\cdot h)^{2}(r\cdot h)(s\cdot h)]=\sum_{i}p_{i}\mathbb{E}[h_{i}(p\cdot h)(r\cdot h)(s\cdot h)]\underset{\text{Stein}}{=}\sum p_{i}\left(p_{i}\mathbb{E}[(r\cdot h)(s\cdot h)]+r_{i}\mathbb{E}[(p\cdot h)(s\cdot h)]+s_{i}\mathbb{E}[(p\cdot h)(r\cdot h)]\right)={\|p\|}^{2}(r\cdot s)+(p\cdot r)(p\cdot s)+(p\cdot s)(p\cdot r)={\|p\|}^{2}(r\cdot s)+2(p\cdot s)(p\cdot r)\,.\qedhere These estimations result in a useful lemma for establishing SASC of barriers for linear constraints. For $v,w\in\mathbb{R}^{d}$ and $h\sim\mathcal{N}(0,I_{d})$, $\mathbb{E}[(v\cdot h)^{3}(w\cdot h)^{3}]=9{\|v\|}^{2}{\|w\|}^{2}(v\cdot w)+6(v\cdot w)^{3}$. Using Stein's lemma, \mathbb{E}[(v\cdot h)^{3}(w\cdot h)^{3}]=\sum_{i}v_{i}\mathbb{E}[h_{i}(v\cdot h)^{2}(w\cdot h)^{3}]=\sum v_{i}{\bigl(2v_{i}\mathbb{E}[(v\cdot h)(w\cdot h)^{3}]+3w_{i}\mathbb{E}[(v\cdot h)^{2}(w\cdot h)^{2}]\bigr)}\underset{\text{(i)}}{=}2{\|v\|}^{2}\cdot3{\|w\|}^{2}(v\cdot w)+3(v\cdot w){\bigl({\|v\|}^{2}{\|w\|}^{2}+2(v\cdot w)^{2}\bigr)}=9{\|v\|}^{2}{\|w\|}^{2}+6(v\cdot w)^{3}\,, where in (i) we used Proposition \ref{['prop:stein-comp']}-1 and 2. For $p,q,r,s\in\mathbb{R}^{d}$ and $h\sim\mathcal{N}(0,I_{d})$, \mathbb{E}[(p\cdot h)^{2}(q\cdot h)(r\cdot h)^{2}(s\cdot h)]=(q\cdot s){\|p\|}^{2}{\|r\|}^{2}+4(p\cdot r)(p\cdot q)(r\cdot s)+2{\|p\|}^{2}(r\cdot q)(r\cdot s)+2{\|r\|}^{2}(p\cdot q)(p\cdot s)+2(p\cdot r)^{2}(q\cdot s)+4(p\cdot s)(p\cdot r)(r\cdot q)\,. Using Stein's lemma, \mathbb{E}[(p\cdot h)^{2}(q\cdot h)(r\cdot h)^{2}(s\cdot h)]=\sum_{i}q_{i}\mathbb{E}[h_{i}(p\cdot h)^{2}(r\cdot h)^{2}(s\cdot h)]=\sum q_{i}{\bigl(2p_{i}\mathbb{E}[(p\cdot h)(r\cdot h)^{2}(s\cdot h)]+2r_{i}\mathbb{E}[(p\cdot h)^{2}(r\cdot h)(s\cdot h)]+2s_{i}\mathbb{E}[(p\cdot h)^{2}(r\cdot h)^{2}]\bigr)}\underset{\text{(i)}}{=}2(p\cdot q){\bigl({\|r\|}^{2}(p\cdot s)+2(p\cdot r)(r\cdot s)\bigr)}+2(r\cdot q){\bigl({\|p\|}^{2}(r\cdot s)+2(p\cdot s)(p\cdot r)\bigr)}\qquad+(q\cdot s){\bigl({\|p\|}^{2}{\|r\|}^{2}+2(p\cdot r)^{2}\bigr)}=(q\cdot s){\|p\|}^{2}{\|r\|}^{2}+4(p\cdot r)(p\cdot q)(r\cdot s)+2{\|p\|}^{2}(r\cdot q)(r\cdot s)+2{\|r\|}^{2}(p\cdot q)(p\cdot s)\qquad+2(p\cdot r)^{2}(q\cdot s)+4(p\cdot s)(p\cdot r)(r\cdot q)\,. In (i), we used Proposition \ref{['prop:stein-comp']}-3 to the first two terms and Proposition \ref{['prop:stein-comp']}-2 to the third term.

Gaussian Cooling and Dikin Walks: The Interior-Point Method for Logconcave Sampling

TL;DR

Abstract

Gaussian Cooling and Dikin Walks: The Interior-Point Method for Logconcave Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (7)