nDNA -- the Semantic Helix of Artificial Cognition
Amitava Das
TL;DR
This work introduces Neural Genomics and the nDNA framework to diagnose the latent cognition of foundation models beyond surface outputs. It formalizes three per-layer latent-geometric measures—spectral curvature $\kappa_\ell$, thermodynamic length $\mathcal{L}_\ell$, and belief vector field $\mathbf{v}_\ell^{(c)}$—and combines them into a unified nDNA score that captures inheritance, drift, and adaptation across training histories. The Cartograph and its subsections lay out why trajectories, not weights, are fundamental, and present diagnostics (nHD, nGDI, nTEDS, nTDS, nKaryotyping, nDIV, nEPI, nCCL) to quantify heritable transformations, cultural priors, and corpus dependence. By framing latent cognition as a lineage with measurable geometry, the framework aims to enable safer governance, transparency, and auditing of AI systems as they evolve through pretraining, fine-tuning, alignment, distillation, and merging.
Abstract
As AI foundation models grow in capability, a deeper question emerges: What shapes their internal cognitive identity -- beyond fluency and output? Benchmarks measure behavior, but the soul of a model resides in its latent geometry. In this work, we propose Neural DNA (nDNA) as a semantic-genotypic representation that captures this latent identity through the intrinsic geometry of belief. At its core, nDNA is synthesized from three principled and indispensable dimensions of latent geometry: spectral curvature, which reveals the curvature of conceptual flow across layers; thermodynamic length, which quantifies the semantic effort required to traverse representational transitions through layers; and belief vector field, which delineates the semantic torsion fields that guide a model's belief directional orientations. Like biological DNA, it encodes ancestry, mutation, and semantic inheritance, found in finetuning and alignment scars, cultural imprints, and architectural drift. In naming it, we open a new field: Neural Genomics, where models are not just tools, but digital semantic organisms with traceable inner cognition. Modeling statement. We read AI foundation models as semantic fluid dynamics: meaning is transported through layers like fluid in a shaped conduit; nDNA is the physics-grade readout of that flow -- a geometry-first measure of how meaning is bent, paid for, and pushed -- yielding a stable, coordinate-free neural DNA fingerprint tied to on-input behavior; with this fingerprint we cross into biology: tracing lineages across pretraining, fine-tuning, alignment, pruning, distillation, and merges; measuring inheritance between checkpoints; detecting drift as traits shift under new data or objectives; and, ultimately, studying the evolution of artificial cognition to compare models, diagnose risks, and govern change over time.
