Table of Contents
Fetching ...

An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification

Hyenkyun Woo

TL;DR

The paper tackles imbalanced and scale-imbalanced classification by introducing SIGTRON, an extended asymmetric sigmoid with Perceptron, and the SIC model, which uses a virtual SIGTRON-induced convex loss with internal parameters $(\alpha_+,\alpha_-)$. It develops a quasi-Newton LBFGS optimization framework with an interval-based line search to efficiently minimize the virtual convex losses. Across 118 diverse datasets, SIC often delivers superior or competitive test accuracy compared with $\pi$-weighted convex focal losses and LIBLINEAR, with binary tasks showing notable gains and multiclass tasks remaining competitive with kernel methods. The work provides insight into skewed hyperplanes and dataset imbalance structure, offering an internally parameterized alternative to external cost-sensitive weighting for imbalanced learning.

Abstract

This article presents a new polynomial parameterized sigmoid called SIGTRON, which is an extended asymmetric sigmoid with Perceptron, and its companion convex model called SIGTRON-imbalanced classification (SIC) model that employs a virtual SIGTRON-induced convex loss function. In contrast to the conventional $π$-weighted cost-sensitive learning model, the SIC model does not have an external $π$-weight on the loss function but has internal parameters in the virtual SIGTRON-induced loss function. As a consequence, when the given training dataset is close to the well-balanced condition considering the (scale-)class-imbalance ratio, we show that the proposed SIC model is more adaptive to variations of the dataset, such as the inconsistency of the (scale-)class-imbalance ratio between the training and test datasets. This adaptation is justified by a skewed hyperplane equation, created via linearization of the gradient satisfying $ε$-optimal condition. Additionally, we present a quasi-Newton optimization(L-BFGS) framework for the virtual convex loss by developing an interval-based bisection line search. Empirically, we have observed that the proposed approach outperforms (or is comparable to) $π$-weighted convex focal loss and balanced classifier LIBLINEAR(logistic regression, SVM, and L2SVM) in terms of test classification accuracy with $51$ two-class and $67$ multi-class datasets. In binary classification problems, where the scale-class-imbalance ratio of the training dataset is not significant but the inconsistency exists, a group of SIC models with the best test accuracy for each dataset (TOP$1$) outperforms LIBSVM(C-SVC with RBF kernel), a well-known kernel-based classifier.

An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification

TL;DR

The paper tackles imbalanced and scale-imbalanced classification by introducing SIGTRON, an extended asymmetric sigmoid with Perceptron, and the SIC model, which uses a virtual SIGTRON-induced convex loss with internal parameters . It develops a quasi-Newton LBFGS optimization framework with an interval-based line search to efficiently minimize the virtual convex losses. Across 118 diverse datasets, SIC often delivers superior or competitive test accuracy compared with -weighted convex focal losses and LIBLINEAR, with binary tasks showing notable gains and multiclass tasks remaining competitive with kernel methods. The work provides insight into skewed hyperplanes and dataset imbalance structure, offering an internally parameterized alternative to external cost-sensitive weighting for imbalanced learning.

Abstract

This article presents a new polynomial parameterized sigmoid called SIGTRON, which is an extended asymmetric sigmoid with Perceptron, and its companion convex model called SIGTRON-imbalanced classification (SIC) model that employs a virtual SIGTRON-induced convex loss function. In contrast to the conventional -weighted cost-sensitive learning model, the SIC model does not have an external -weight on the loss function but has internal parameters in the virtual SIGTRON-induced loss function. As a consequence, when the given training dataset is close to the well-balanced condition considering the (scale-)class-imbalance ratio, we show that the proposed SIC model is more adaptive to variations of the dataset, such as the inconsistency of the (scale-)class-imbalance ratio between the training and test datasets. This adaptation is justified by a skewed hyperplane equation, created via linearization of the gradient satisfying -optimal condition. Additionally, we present a quasi-Newton optimization(L-BFGS) framework for the virtual convex loss by developing an interval-based bisection line search. Empirically, we have observed that the proposed approach outperforms (or is comparable to) -weighted convex focal loss and balanced classifier LIBLINEAR(logistic regression, SVM, and L2SVM) in terms of test classification accuracy with two-class and multi-class datasets. In binary classification problems, where the scale-class-imbalance ratio of the training dataset is not significant but the inconsistency exists, a group of SIC models with the best test accuracy for each dataset (TOP) outperforms LIBSVM(C-SVC with RBF kernel), a well-known kernel-based classifier.
Paper Structure (12 sections, 6 theorems, 48 equations, 9 figures, 6 tables, 2 algorithms)

This paper contains 12 sections, 6 theorems, 48 equations, 9 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1.1

For $\{ x_i \;|\; i \in {\cal N}_+ \}$ and $\{ x_j \;|\; j \in {\cal N}_- \}$, let us consider mean zero normalization, $\sum_{i \in {\cal N}_+} x_i + \sum_{j \in {\cal N}_-} x_j=0$. Here $x_p^c \not= 0$ and $x_n^c \not= 0$. Then, if $r_{sc} =1$, we have $r_c=1$. On the other hand, if $r_{sc} \not=1

Figures (9)

  • Figure 1: (a) SIGTRON $s_{\alpha,c}(x)$ with $\alpha = \frac{k-1}{k}<1$ ($k=1,2,3,4,6,10$) and $c_{\alpha}=-1$. Note that $s_{\alpha,c}(x) = 1$, if $x \ge -c_{\alpha}$. (b) SIGTRON $s_{\alpha,c}(x)$ with $\alpha = \frac{k+1}{k}>1$ ($k=1,2,3,4,6,10$) and $c_{\alpha}=1$. Note that $s_{\alpha,c}(x) = 0$, if $x \le -c_{\alpha}$. (c) $\nabla s_{\alpha,c}(x)$ with $\alpha = \frac{k-1}{k}<1$ ($k=2,3,4,6,10$) and $c_{\alpha}=-1$. Note that $\nabla s_{\alpha,c}(x) = 0$, if $x \ge -c_{\alpha}$. The inflection point $x_{ip}$ is getting close to $-c_{\alpha} = 1$ as $\alpha \rightarrow 0$. (d) $\nabla s_{\alpha,c}(x)$ with $\alpha = \frac{k+1}{k}>1$ ($k=2,3,4,6,10$) and $c_{\alpha}=1$. Note that $\nabla s_{\alpha,c}(x) = 0$, if $x \le -c_{\alpha}$. The inflection point $x_{ip}$ is getting close to $-c_{\alpha} = -1$ as $\alpha \rightarrow 2$.
  • Figure 2: Graphs of the virtual SIGTRON-induced loss function $L^S_{\alpha,c}$ for (a) $\alpha= \frac{k-1}{k}$ with $c_{\alpha}=-1$ and (b) $\alpha = \frac{k+1}{k}$ with $c_{\alpha}=1$. Here $k=1,2,4,6$, and $10$. In the case of $k=1,2,4,6$, $L_{\alpha,c}^S$ has a closed-form expression. See Example \ref{['F-remark']}.
  • Figure 3: Classification results with the spectf dataset in Table \ref{['2classimb']}. The test dataset has $r_{sc} = 0.26(r_c=0.09)$. However, the training dataset is well-balanced, i.e., $r_{sc}=1$. We have $20\times20$ hyperplanes $h^*_{(\alpha_+,\alpha_-)}(x)=0$ by solving $20\times20$ SIC models \ref{['linmin']} with the well-balanced training dataset. (a) The pattern of the test classification accuracy. (b) The pattern of the signed distance of the centroid of the positive test dataset to the hyperplane. (c) The pattern of the signed distance of the centroid of the negative test dataset to the hyperplane. (d) The pattern of $\eta$ in \ref{['simple-eta']}. The best test accuracy is achieved at $(\alpha_+,\alpha_-) = (\frac{11}{10},2)$. This point is the smallest distance of the centroid of the positive test dataset to the hyperplane. And, it is contained in the group $\left\{ (\frac{11}{10},2), (\frac{9}{10},2), (\frac{11}{10},0), (\frac{9}{10},0) \right\}$ having the smallest $\eta = \frac{1}{11}$. See Example \ref{['example2']} for more details.
  • Figure 4: A comparison of performanve between quasi-Newton(L-BFGS) for virtual convex loss, which uses the strong Wolfe condition \ref{['strongwolfeD']} with $c_{II} \in [0.1,0.9]$, and the classic L-BFGS(*), which uses the strong Wolfe condition \ref{['strongwolfeD']} with $c_{II}=0.9$ and the Armijo condition \ref{['armijoC']} with $c_{I}=10^{-4}$. Note that L-BFGS(*) uses the cubic-interpolation-based line search nocedal06schmidt05. For our experiments, we use $12\times12$ SIC models with ${\left|\,c_{\alpha}\,\right|}=1$ and ${\left|\,\alpha_{\pm}-1\,\right|}=\frac{1}{k}$, where $k=1,2,3,4,5,6$. (a) Mean test accuracy of $12 \times 12$ SIC models. (b) Maximum test accuracy and (c) minimum test accuracy obtained by an SIC model with fixed $\alpha_{\pm}$ for each $c_{II}$. (d) Test accuracy of TOP$1$ for each $c_{II}$. (e) Total computation time of $12 \times 12$ SIC models. Here, we report the average values of five times repeated experiments with all datasets in Table \ref{['2classimb']} and \ref{['mclassimb']}. For $0.1 \le c_{II} \le 0.5$, in terms of mean test classification accuracy in (a), quasi-Newton(L-BFGS) for virtual convex loss outperforms L-BFGS(*), for each $m$ of two loop iterations.
  • Figure 5: Graphs of classification performance of the SIC model \ref{['linmin']} with $\alpha=\alpha_-=\alpha_+ \in [0,2]$. (a) Test accuracy($\%$) vs. ${\left|\,c_{\alpha}\,\right|}$. The test accuracy is the average of all results obtained with $\alpha$ in \ref{['setofalpha']}. When ${\left|\,c_{\alpha}\,\right|} = 2$, the best performance is achieved. (b) Test accuracy($\%$) vs. $\alpha$. When $\frac{1}{5} \le {\left|\,\alpha-1\,\right|} \le 1$, the best performance is achieved with ${\left|\,c_{\alpha}\,\right|} = 1$. However, when ${\left|\,\alpha-1\,\right|}< \frac{1}{5}$, the SIC model with ${\left|\,c_{\alpha}\,\right|} \ge 2$ shows better performance.
  • ...and 4 more figures

Theorems & Definitions (21)

  • Theorem 1.1
  • proof
  • Definition 2.1: SIGTRON
  • Theorem 2.2
  • Corollary 2.3
  • proof
  • Remark 2.4
  • Example 2.5
  • Definition 3.1
  • Lemma 3.2
  • ...and 11 more