A Geometric Approach to Problems in Optimization and Data Science

Naren Sarayu Manoj

A Geometric Approach to Problems in Optimization and Data Science

Naren Sarayu Manoj

TL;DR

This thesis develops a geometric framework for core optimization and data-science problems by embedding them in high-dimensional convex geometry. It introduces streaming ellipsoidal rounding and coreset techniques to approximate convex polytopes and hulls with near-optimal distortion, while using block Lewis weights to sparsify block-norm objectives and accelerate MSN-type regression. It also analyzes robustness to adversarial data through backdoor models and monotone adversaries in both optimization (dueling) and clustering (spectral) tasks, deriving both algorithmic guarantees and fundamental limits. Collectively, these results yield memory-efficient, scalable algorithms with provable guarantees for core ML tasks under streaming, distributed, and adversarial settings, and provide a principled link between geometric approximations and statistical robustness. The work has practical impact in fast, robust optimization and data-analysis pipelines, including multidistributional regression, sparsification, and robust spectral methods.

Abstract

We give new results for problems in computational and statistical machine learning using tools from high-dimensional geometry and probability. We break up our treatment into two parts. In Part I, we focus on computational considerations in optimization. Specifically, we give new algorithms for approximating convex polytopes in a stream, sparsification and robust least squares regression, and dueling optimization. In Part II, we give new statistical guarantees for data science problems. In particular, we formulate a new model in which we analyze statistical properties of backdoor data poisoning attacks, and we study the robustness of graph clustering algorithms to ``helpful'' misspecification.

A Geometric Approach to Problems in Optimization and Data Science

TL;DR

Abstract

A Geometric Approach to Problems in Optimization and Data Science

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (440)