We develop likelihood-based bias reduction for nonlinear panel models with additive individual and time effects. In two-way panels, integrated-likelihood corrections are attractive but challenging because the required integration is high dimensional and standard Laplace approximations may fail when the parameter dimension grows with the sample size. We propose a target-centered full-exponential Laplace--cumulant expansion that exploits the sparse higher-order derivative structure implied by additive effects, delivering a tractable approximation with a negligible remainder under large-$N,T$ asymptotics. The expansion motivates robust priors that yield bias reduction for both common parameters and fixed effects. We provide implementations for binary, ordered, and multinomial response models with two-way effects. For average partial effects, we show that the remaining first-order bias has a simple variance form and can be removed by a closed-form adjustment. Monte Carlo experiments and an empirical illustration show substantial bias reduction with accurate inference.
Sampled network data are common in empirical research because collecting full network information is costly, but using sampled networks can lead to biased estimates. We propose a nonparametric imputation method for sampled networks and show that empirical analysis based on imputed networks yields consistent parameter estimates. Our approach imputes missing network links by combining a projection onto covariates with a local two-way fixed-effects regression, which avoids parametric assumptions, does not rely on low-rank restrictions, and flexibly accommodates both observed covariates and unobserved heterogeneity. We establish entrywise convergence rates for the imputed matrix and prove the consistency of GMM estimators based on the imputed network. We further derive the convergence rate of the corresponding estimator in the linear-in-means peer-effects model. Simulations show strong performance of our method both in terms of imputation accuracy and in downstream empirical analysis. We illustrate our method with an application to the microfinance network data of Banerjee et al. (2013).
We analyse the UK income distribution from 2000 to 2023 using HMRC annual percentile data for both pre-tax and post-tax income. We fit a prefactor-adjusted $κ$-generalised specification to the data by weighted non-linear least squares and use inverse transform sampling to generate simulated income populations. The results suggest a redistribution of income shares over the period: the bottom 40\% appears to have increased its share, the middle-upper part of the distribution (50th--90th percentiles) lost share, the top 10\% remained broadly stable, and the top 1\% increased its share of pre-tax income. Because the modified specification is defined only above a positive threshold, conclusions concerning the lower tail should be interpreted with some caution. Using simulated 2023 pre-tax incomes to examine tax reform scenarios, we find that revenue-equivalent tax increases on high-income earners must be more than four times as large as comparable increases on lower-income earners. This suggests that, despite increased concentration at the top, the UK tax base remains driven primarily by the large number of taxpayers outside the very top of the distribution.
This paper proposes Covariate-Balanced Weighted Stacked Difference-in-Differences (CBWSDID), a design-based extension of weighted stacked DID for settings in which untreated trends may be conditionally rather than unconditionally parallel. The estimator separates within-subexperiment design adjustment from across-subexperiment aggregation: matching or weighting improves treated-control comparability within each stacked subexperiment, while the corrective stacked weights of Wing et al. recover the target aggregate ATT. I show that the same logic extends from absorbing treatment to repeated $0 \to 1$ and $1 \to 0$ episodes under a finite-memory assumption. The paper develops the identifying framework, discusses inference, presents simulation evidence, and illustrates the estimator in applications based on Trounstine (2020) and Acemoglu et al. (2019). Across these examples, CBWSDID serves as a bridge between weighted stacked DID and design-based panel matching. The accompanying R package cbwsdid is available on GitHub.
This paper establishes the theoretical and practical foundations for using Large Language Models (LLMs) as measurement instruments for latent economic variables -- specifically variables that describe the cognitive content of occupational tasks at a level of granularity not achievable with existing survey instruments. I formalize four conditions under which LLM-generated scores constitute valid instruments: semantic exogeneity, construct relevance, monotonicity, and model invariance. I then apply this framework to the Augmented Human Capital Index (AHC_o), constructed from 18,796 O*NET task statements scored by Claude Haiku 4.5, and validated against six existing AI exposure indices. The index shows strong convergent validity (r = 0.85 with Eloundou GPT-gamma, r = 0.79 with Felten AIOE) and discriminant validity. Principal component analysis confirms that AI-related occupational measures span two distinct dimensions -- augmentation and substitution. Inter-rater reliability across two LLM models (n = 3,666 paired scores) yields Pearson r = 0.76 and Krippendorff's alpha = 0.71. Prompt sensitivity analysis across four alternative framings shows that task-level rankings are robust. Obviously Related Instrumental Variables (ORIV) estimation recovers coefficients 25% larger than OLS, consistent with classical measurement error attenuation. The methodology generalizes beyond labor economics to any domain where semantic content must be quantified at scale.
It is common when using cross-section or panel data to assign each observation to a cluster and allow for arbitrary patterns of heteroskedasticity and correlation within clusters. For regression models, there are many ways to make cluster-robust inferences. A number of different variance matrix estimators can be used. Hypothesis tests and confidence intervals can then be based on several alternative analytic or bootstrap distributions. Some methods typically perform much better than others, but no method yields reliable inferences in every case. Thus it can be hard to know which $P$ values and confidence intervals to trust. Nevertheless, by using a number of procedures to assess the reliability of various inferential methods for a specific model and dataset, we can often obtain results in which we may be reasonably confident.
Traditional econometric analyzes represent observations as vectors despite the inherent complexity of empirical data structures. When data are organized along dual classification dimensions, a matrix representation provides a more natural and interpretable framework. Building on recent advances in matrix autoregressive (MAR) modeling, this study introduces a novel error correction representation tailored for matrix-structured data. Through comparative analysis with existing methodologies, we demonstrate two critical advancements. First, the proposed model preserves the interpretative foundations of conventional cointegration analysis, with coefficients that explicitly capture dynamics rooted in adjustment toward steady-state positions. Second, in contrast to previous formulations, our error correction framework allows for an equivalent matrix autoregressive representation, preserving the fundamental structure of the data in both specifications. This ensures that the matrix representation reflects an intrinsic characteristic of the data.
2603.29889This paper develops a penalized GMM (PGMM) framework for automatic debiased inference on functionals of nonparametric instrumental variable estimators. We derive convergence rates for the PGMM estimator and provide conditions for root-n consistency and asymptotic normality of debiased functional estimates, covering both linear and nonlinear functionals. Monte Carlo experiments on average derivative show that the PGMM-based debiased estimator performs on par with the analytical debiased estimator that uses the known closed-form Riesz representer, achieving 90-96% coverage while the plug-in estimator falls below 5%. We apply our procedure to estimate mean own-price elasticities in a semiparametric demand model for differentiated products. Simulations confirm near-nominal coverage while the plug-in severely undercovers. Applied to IRI scanner data on carbonated beverages, debiased semiparametric estimates are approximately 20% more elastic compared to the logit benchmark, and debiasing corrections are heterogeneous across products, ranging from negligible to several times the standard error.
We propose a novel framework for conducting causal inference based on counterfactual densities. While the current paradigm of causal inference is mostly focused on estimating average treatment effects (ATEs), which restricts the analysis to the first moment of the outcome variable, our density-based approach is able to detect causal effects based on general distributional characteristics. Following the Oaxaca-Blinder decomposition approach, we consider two types of counterfactual density effects that together explain observed discrepancies between the densities of the treated and control group. First, the distribution effect is the counterfactual effect of changing the conditional density of the control group to that of the treatment group, while keeping the covariates fixed at the treatment group distribution. Second, the covariate effect represents the effect of a hypothetical change in the covariate distribution. Both effects have a causal interpretation under the classical unconfoundedness and overlap assumptions. Methodologically, our approach is based on analyzing the conditional densities as elements of a Bayes Hilbert space, which preserves the non-negativity and integration-to-one constraints. We specify a flexible functional additive regression model estimating the conditional densities. We apply our method to analyze the German East--West income gap, i.e., the observed differences in wages between East Germans and West Germans. While most of the existing studies focus on the average differences and neglect other distributional characteristics, our density-based approach is suited to detect all nuances of the counterfactual distributions, including differences in probability masses at zero.
2603.27881This paper proposes a specification test for the conventional distributional assumptions of error terms in binary choice models, focusing on its tail properties. Based on extreme value theory, we first establish that the tail index of the unobserved error can be recovered by that of the observed covariates. The null hypothesis of the index being zero essentially covers the widely used probit and logit models. We then construct a simple and powerful statistical test for both cross-sectional and panel data, requiring no model estimation and no parametric assumptions. Monte Carlo simulations demonstrate that our test performs well in size and power, and applications to three empirical examples on firm export and innovation decisions and female labor force participation illustrate its general applicability.
2603.27762Normalization is ubiquitous in economics, and a growing literature shows that ``normalizations'' can matter for interpretation, counterfactual analysis, misspecification, and inference. This paper provides a general framework for these issues, based on the formalized notion of modeling equivalence that partitions the space of unknowns into equivalence classes, and defines normalization as a WLOG selection of one representative from each class. A counterfactual parameter is normalization-free if and only if it is constant on equivalence classes; otherwise any point identification is created by the normalization rather than by the model. Applications to discrete choice, demand estimation, and network formation illustrate the insights made explicit through this criterion. We then study two further sources of fragility: an extension trilemma establishes that fidelity, invariance, and regularity cannot simultaneously hold at a boundary singularity, while a normalization can itself introduce a coordinate singularity that distorts the topological and metric structures of the parameter space, with consequences for estimation and inference.
We propose algorithms for conducting Bayesian inference in structural vector autoregressions identified using sign restrictions. The key feature of our approach is a sampling step based on 'soft' sign restrictions. This step draws from a target density that smoothly penalises parameter values that violate the restrictions, facilitating the use of computationally efficient Markov chain Monte Carlo sampling algorithms. An importance-sampling step yields draws conditional on the 'hard' sign restrictions. Relative to standard accept-reject sampling, the method substantially speeds up sampling when identification is tight. It also facilitates implementing prior-robust Bayesian methods. We illustrate the broad applicability of the approach in an oil-market model identified using a rich set of sign, elasticity and narrative restrictions.
This paper studies how to estimate an individual's taste for forming a connection with another individual in a network. It compares the difficulty of estimation with and without the assumption that utility is transferable between individuals, and with and without the assumption that regressors are symmetric across individuals in the pair. I show that when pair-specific regressors are symmetric, the sufficient conditions for consistency and asymptotic normality of the maximum likelihood estimator that assumes transferable utility (TU-MLE) are also sufficient for the maximum likelihood estimator that does not assume transferable utility (NTU-MLE). When regressors are asymmetric, I provide sufficient conditions for the consistency and asymptotic normality of the NTU-MLE. I also provide a specification test to assess the validity of the transferable utility assumption. Two applications from different fields of economics demonstrate the value of my results. I find evidence of researchers using the TU-MLE when the transferable utility assumption is violated, and evidence of researchers using NTU-model-based estimators when the validity of the transferable utility assumption cannot be rejected.
In this paper I develop a breakdown frontier approach to assess the sensitivity of Local Average Treatment Effects (LATE) estimates to violations of monotonicity and independence of the instrument. I parametrize violations of independence using the concept of $c$-dependence from Masten & Poirier (2018) and allow for the share of defiers to be greater than zero but smaller than the share of compliers. I derive identified sets for the LATE and the Average Treatment Effect (ATE) in which the bounds are functions of these two sensitivity parameters. Using these bounds, I derive the breakdown frontier for the LATE, which is the weakest set of assumptions such that a conclusion regarding the LATE holds. I derive consistent sample analogue estimators for the breakdown frontiers and provide a valid bootstrap procedure for inference. Monte Carlo simulations show the desirable finite-sample properties of the estimators and an empirical application shows that the conclusions regarding the effect of family size on unemployment from Angrist & Evans (1998) are highly sensitive to violations of independence and monotonicity.
We propose a method for constructing distribution-free prediction intervals in nonparametric instrumental variable regression (NPIV), with finite-sample coverage guarantees. Building on the conditional guarantee framework in conformal inference, we reformulate conditional coverage as marginal coverage over a class of IV shifts $\mathcal{F}$. Our method can be combined with any NPIV estimator, including sieve 2SLS and other machine-learning-based NPIV methods such as neural networks minimax approaches. Our theoretical analysis establishes distribution-free, finite-sample coverage over a practitioner-chosen class of IV shifts.
2603.24970This article studies randomization inference for treatment effects in randomized controlled trials with attrition, where outcomes are observed for only a subset of units. We assume monotonicity in reporting behavior as in \cite{lee2009training} and focus on the average treatment effect for always-reporters (AR-ATE), defined as units whose outcomes are observed under both treatment and control. Because always-reporter status is only partially revealed by observed assignment and response patterns, we propose a worst-case randomization test that maximizes the randomization p-value over all always-reporter configurations consistent with the data, with an optional pretest to prune implausible configurations. Using studentized Hajek- and chi-square-type statistics, we show the resulting procedure is finite-sample valid for the sharp null and asymptotically valid for the weak null. We also discuss computational implementations for discrete outcomes and integer-programming-based bounds for continuous outcomes.
It has become standard for empirical studies to conduct inference robust to cluster dependence and heterogeneity. With a small number of clusters, the normal approximation for the $t$-statistics of regression coefficients may be poor. This paper tackles this problem using a critical value based on the conditional Cramér-Edgeworth expansion for the $t$-statistics. Our approach guarantees third-order refinement, regardless of whether a regressor is discrete or not, and, unlike the cluster pairs bootstrap, avoids resampling data. Simulations show that our proposal can make a difference in size control with as few as 10 clusters.
Modern pretrained time-series foundation models can forecast without task-specific training, but they do not fully incorporate economic behavior. We show that teaching them basic economic logic improves how they predict demand using an experimental panel. We fine-tune Amazon Chronos-2, a transformer-based probabilistic time-series model, on synthetic data generated from utility-maximizing agents. We exploit Afriat's theorem, which guarantees that demand satisfies the Generalized Axiom of Revealed Preference (GARP) if and only if it can be generated by maximizing some utility function subject to a budget constraint. GARP is a simple condition to check that allows us to generate time series from a large class of utilities efficiently. The fine-tuned model serves as a rationality-constrained forecasting prior: it learns price-quantity relations from GARP-consistent synthetic histories and then uses those relations to predict the choices of real consumers. We find that fine-tuning on GARP-consistent synthetic data substantially improves prediction relative to zero-shot Chronos-2 at all forecast horizons we study. Our results show that economic theory can be used to generate structured synthetic data that improves foundation-model predictions when the theory implies observable patterns in the data.
2603.23294A model-free measure of Granger causality in expectiles is proposed, generalizing the traditional mean-based measure to arbitrary positions of the conditional distribution. Expectiles are the only law-invariant risk measures that are both coherent and elicitable, making them particularly well-suited for studying distributional Granger causality where risk quantification and forecast evaluation are both relevant. Based on this measure, a test is developed using M-vine copula models that accounts for multivariate Granger causality with $d+1$ series under non-linear and non-Gaussian dependence, without imposing parametric assumptions on the joint distribution. Strong consistency of the test statistic is established under some regularity conditions. In finite samples, simulations show accurate size control and power increasing with sample size. A key advantage is the joint testing capability: causal relationships invisible to pairwise tests can be detected, as demonstrated both theoretically and empirically. Two applications to international stock market indices at the global and Asian regional level illustrate the practical relevance of the proposed framework.
This paper examines how regulatory interventions in high-frequency financial markets affect price discovery. We focus on Breaking news, where dynamic circuit breakers trigger trading halts immediately after the release of macroeconomic fundamentals. Within a high-frequency signal-in-noise model, we show that triggering rules complicate statistical inference for the price impact of news, rendering conventional non-parametric jump estimators inconsistent. Building on this insight, we develop a regression-based test for fundamental pricing that accounts for non-vanishing transition times. The test compares transition price changes to efficient jumps implied by observable factors. Our empirical analysis of CME E-mini S\&P 500 futures shows that Breaking news are associated with systematic deviations from fundamental pricing, predominantly in the form of overshooting. Our findings highlight a regulatory trade-off: the appeal of simple and transparent circuit breaker rules must be weighed against their cost of preventing fundamentals from being priced contemporaneously, thereby creating adverse incentives and introducing distortions.