Table of Contents
Fetching ...

A Tutorial on Regression Analysis: From Linear Models to Deep Learning -- Lecture Notes on Artificial Intelligence

Jingyuan Wang, Jiahao Ji

TL;DR

These lecture notes unify regression analysis from linear models to deep learning, with a self-contained treatment for students with basic math. They define regression as learning a function mapping features to responses and detail a three-part construction (regression function, loss, estimation), then cover linear, logistic, and softmax regression, nonlinear basis-function approaches, kernel methods, and deep neural networks with backpropagation. The notes emphasize regularization (Ridge, LASSO) to control overfitting and discuss gradient-based optimization across settings, including closed-form solutions where available. By linking classical statistical modeling with modern machine-learning practice, the work aims to build a solid conceptual and technical foundation for advanced AI models.

Abstract

This article serves as the regression analysis lecture notes in the Intelligent Computing course cluster (including the courses of Artificial Intelligence, Data Mining, Machine Learning, and Pattern Recognition). It aims to provide students -- who are assumed to possess only basic university-level mathematics (i.e., with prerequisite courses in calculus, linear algebra, and probability theory) -- with a comprehensive and self-contained understanding of regression analysis without requiring any additional references. The lecture notes systematically introduce the fundamental concepts, modeling components, and theoretical foundations of regression analysis, covering linear regression, logistic regression, multinomial logistic regression, polynomial regression, basis-function models, kernel-based methods, and neural-network-based nonlinear regression. Core methodological topics include loss-function design, parameter-estimation principles, ordinary least squares, gradient-based optimization algorithms and their variants, as well as regularization techniques such as Ridge and LASSO regression. Through detailed mathematical derivations, illustrative examples, and intuitive visual explanations, the materials help students understand not only how regression models are constructed and optimized, but also how they reveal the underlying relationships between features and response variables. By bridging classical statistical modeling and modern machine-learning practice, these lecture notes aim to equip students with a solid conceptual and technical foundation for further study in advanced artificial intelligence models.

A Tutorial on Regression Analysis: From Linear Models to Deep Learning -- Lecture Notes on Artificial Intelligence

TL;DR

These lecture notes unify regression analysis from linear models to deep learning, with a self-contained treatment for students with basic math. They define regression as learning a function mapping features to responses and detail a three-part construction (regression function, loss, estimation), then cover linear, logistic, and softmax regression, nonlinear basis-function approaches, kernel methods, and deep neural networks with backpropagation. The notes emphasize regularization (Ridge, LASSO) to control overfitting and discuss gradient-based optimization across settings, including closed-form solutions where available. By linking classical statistical modeling with modern machine-learning practice, the work aims to build a solid conceptual and technical foundation for advanced AI models.

Abstract

This article serves as the regression analysis lecture notes in the Intelligent Computing course cluster (including the courses of Artificial Intelligence, Data Mining, Machine Learning, and Pattern Recognition). It aims to provide students -- who are assumed to possess only basic university-level mathematics (i.e., with prerequisite courses in calculus, linear algebra, and probability theory) -- with a comprehensive and self-contained understanding of regression analysis without requiring any additional references. The lecture notes systematically introduce the fundamental concepts, modeling components, and theoretical foundations of regression analysis, covering linear regression, logistic regression, multinomial logistic regression, polynomial regression, basis-function models, kernel-based methods, and neural-network-based nonlinear regression. Core methodological topics include loss-function design, parameter-estimation principles, ordinary least squares, gradient-based optimization algorithms and their variants, as well as regularization techniques such as Ridge and LASSO regression. Through detailed mathematical derivations, illustrative examples, and intuitive visual explanations, the materials help students understand not only how regression models are constructed and optimized, but also how they reveal the underlying relationships between features and response variables. By bridging classical statistical modeling and modern machine-learning practice, these lecture notes aim to equip students with a solid conceptual and technical foundation for further study in advanced artificial intelligence models.

Paper Structure

This paper contains 30 sections, 3 theorems, 176 equations, 23 figures, 9 tables, 2 algorithms.

Key Result

Theorem 1

Let $f:\mathbb{R}^n \to \mathbb{R}$ be differentiable at a point $\bm{x}\in\mathbb{R}^n$, and assume that $\nabla f(\bm{x}) \neq \bm{0}$. Among all directions $\bm{d}$ with unit norm $\|\bm{d}\|_2 = 1$, the directional derivative of $f$ at $\bm{x}$ in the direction $\bm{d}$, is defined as which is rate of change of the function $f$ at the point $\bm{x}$ in the direction $\bm{d}$. The directional

Figures (23)

  • Figure 1: An example of regression analysis for rental prices near a university.
  • Figure 2: Plate IX from Galton (1886): Rate of regression in hereditary stature. Mid-parent height (in inches) is plotted on the horizontal axis, and the median child height is plotted for each parental height band. The identity line (AB) represents no change between generations, while the empirical line (CD) shows that children of extremely tall parents tend to be shorter, and children of extremely short parents tend to be taller--illustrating regression toward the mean.
  • Figure 3: Laozi and the passage from the Tao Te Ching: "The Way of Heaven reduces excess and replenishes deficiency; the way of humankind is the opposite -- it takes from the poor to serve the rich."
  • Figure 4: Geometric interpretation of extrema for multivariate functions: a positive definite Hessian corresponds to a bowl-shaped surface (local minimum), a negative definite Hessian corresponds to a dome-shaped surface (local maximum), and an indefinite Hessian produces a saddle-shaped surface (saddle point).
  • Figure 5: An illustrative example of gradient descent.
  • ...and 18 more figures

Theorems & Definitions (7)

  • Definition 1: Regression Analysis
  • Definition 2: Linear Regression
  • Theorem 1: Steepest Descent Direction
  • Proof 1
  • Theorem 2: Weierstrass Approximation Theorem
  • Definition 3: Kernel Function
  • Theorem 3: Universal Approximation Theorem