An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification

Hyenkyun Woo

An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification

Hyenkyun Woo

TL;DR

The paper tackles imbalanced and scale-imbalanced classification by introducing SIGTRON, an extended asymmetric sigmoid with Perceptron, and the SIC model, which uses a virtual SIGTRON-induced convex loss with internal parameters $(\alpha_+,\alpha_-)$. It develops a quasi-Newton LBFGS optimization framework with an interval-based line search to efficiently minimize the virtual convex losses. Across 118 diverse datasets, SIC often delivers superior or competitive test accuracy compared with $\pi$-weighted convex focal losses and LIBLINEAR, with binary tasks showing notable gains and multiclass tasks remaining competitive with kernel methods. The work provides insight into skewed hyperplanes and dataset imbalance structure, offering an internally parameterized alternative to external cost-sensitive weighting for imbalanced learning.

Abstract

This article presents a new polynomial parameterized sigmoid called SIGTRON, which is an extended asymmetric sigmoid with Perceptron, and its companion convex model called SIGTRON-imbalanced classification (SIC) model that employs a virtual SIGTRON-induced convex loss function. In contrast to the conventional $π$-weighted cost-sensitive learning model, the SIC model does not have an external $π$-weight on the loss function but has internal parameters in the virtual SIGTRON-induced loss function. As a consequence, when the given training dataset is close to the well-balanced condition considering the (scale-)class-imbalance ratio, we show that the proposed SIC model is more adaptive to variations of the dataset, such as the inconsistency of the (scale-)class-imbalance ratio between the training and test datasets. This adaptation is justified by a skewed hyperplane equation, created via linearization of the gradient satisfying $ε$-optimal condition. Additionally, we present a quasi-Newton optimization(L-BFGS) framework for the virtual convex loss by developing an interval-based bisection line search. Empirically, we have observed that the proposed approach outperforms (or is comparable to) $π$-weighted convex focal loss and balanced classifier LIBLINEAR(logistic regression, SVM, and L2SVM) in terms of test classification accuracy with $51$ two-class and $67$ multi-class datasets. In binary classification problems, where the scale-class-imbalance ratio of the training dataset is not significant but the inconsistency exists, a group of SIC models with the best test accuracy for each dataset (TOP$1$) outperforms LIBSVM(C-SVC with RBF kernel), a well-known kernel-based classifier.

An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification

TL;DR

. It develops a quasi-Newton LBFGS optimization framework with an interval-based line search to efficiently minimize the virtual convex losses. Across 118 diverse datasets, SIC often delivers superior or competitive test accuracy compared with

-weighted convex focal losses and LIBLINEAR, with binary tasks showing notable gains and multiclass tasks remaining competitive with kernel methods. The work provides insight into skewed hyperplanes and dataset imbalance structure, offering an internally parameterized alternative to external cost-sensitive weighting for imbalanced learning.

Abstract

-weighted cost-sensitive learning model, the SIC model does not have an external

-weight on the loss function but has internal parameters in the virtual SIGTRON-induced loss function. As a consequence, when the given training dataset is close to the well-balanced condition considering the (scale-)class-imbalance ratio, we show that the proposed SIC model is more adaptive to variations of the dataset, such as the inconsistency of the (scale-)class-imbalance ratio between the training and test datasets. This adaptation is justified by a skewed hyperplane equation, created via linearization of the gradient satisfying

-optimal condition. Additionally, we present a quasi-Newton optimization(L-BFGS) framework for the virtual convex loss by developing an interval-based bisection line search. Empirically, we have observed that the proposed approach outperforms (or is comparable to)

-weighted convex focal loss and balanced classifier LIBLINEAR(logistic regression, SVM, and L2SVM) in terms of test classification accuracy with

two-class and

multi-class datasets. In binary classification problems, where the scale-class-imbalance ratio of the training dataset is not significant but the inconsistency exists, a group of SIC models with the best test accuracy for each dataset (TOP

) outperforms LIBSVM(C-SVC with RBF kernel), a well-known kernel-based classifier.

Paper Structure (12 sections, 6 theorems, 48 equations, 9 figures, 6 tables, 2 algorithms)

This paper contains 12 sections, 6 theorems, 48 equations, 9 figures, 6 tables, 2 algorithms.

Introduction
Notation
Cost-sensitive Learning framework and Overview
SIGTRON: extended asymmetric sigmoid with Perceptron
Virtual SIGTRON-induced loss function, SIC(SIGTRON-imbalanced classification) model, and skewed hyperplane equation
Learning a hyperplane with SIC model
Quasi-Newton optimization(L-BFGS) for virtual convex loss
Numerical experiments with the $20 \times 20$ SIC models
Performance evaluation of $20 \times 20$ SIC models
Conclusion
Proof of Theorem \ref{['th:derivative-bernoulli-like']}
The structure of the dataset and classification results of all-class

Key Result

Theorem 1.1

For $\{ x_i \;|\; i \in {\cal N}_+ \}$ and $\{ x_j \;|\; j \in {\cal N}_- \}$, let us consider mean zero normalization, $\sum_{i \in {\cal N}_+} x_i + \sum_{j \in {\cal N}_-} x_j=0$. Here $x_p^c \not= 0$ and $x_n^c \not= 0$. Then, if $r_{sc} =1$, we have $r_c=1$. On the other hand, if $r_{sc} \not=1

Figures (9)

Figure 1: (a) SIGTRON $s_{\alpha,c}(x)$ with $\alpha = \frac{k-1}{k}<1$ ($k=1,2,3,4,6,10$) and $c_{\alpha}=-1$. Note that $s_{\alpha,c}(x) = 1$, if $x \ge -c_{\alpha}$. (b) SIGTRON $s_{\alpha,c}(x)$ with $\alpha = \frac{k+1}{k}>1$ ($k=1,2,3,4,6,10$) and $c_{\alpha}=1$. Note that $s_{\alpha,c}(x) = 0$, if $x \le -c_{\alpha}$. (c) $\nabla s_{\alpha,c}(x)$ with $\alpha = \frac{k-1}{k}<1$ ($k=2,3,4,6,10$) and $c_{\alpha}=-1$. Note that $\nabla s_{\alpha,c}(x) = 0$, if $x \ge -c_{\alpha}$. The inflection point $x_{ip}$ is getting close to $-c_{\alpha} = 1$ as $\alpha \rightarrow 0$. (d) $\nabla s_{\alpha,c}(x)$ with $\alpha = \frac{k+1}{k}>1$ ($k=2,3,4,6,10$) and $c_{\alpha}=1$. Note that $\nabla s_{\alpha,c}(x) = 0$, if $x \le -c_{\alpha}$. The inflection point $x_{ip}$ is getting close to $-c_{\alpha} = -1$ as $\alpha \rightarrow 2$.
Figure 2: Graphs of the virtual SIGTRON-induced loss function $L^S_{\alpha,c}$ for (a) $\alpha= \frac{k-1}{k}$ with $c_{\alpha}=-1$ and (b) $\alpha = \frac{k+1}{k}$ with $c_{\alpha}=1$. Here $k=1,2,4,6$, and $10$. In the case of $k=1,2,4,6$, $L_{\alpha,c}^S$ has a closed-form expression. See Example \ref{['F-remark']}.
Figure 3: Classification results with the spectf dataset in Table \ref{['2classimb']}. The test dataset has $r_{sc} = 0.26(r_c=0.09)$. However, the training dataset is well-balanced, i.e., $r_{sc}=1$. We have $20\times20$ hyperplanes $h^*_{(\alpha_+,\alpha_-)}(x)=0$ by solving $20\times20$ SIC models \ref{['linmin']} with the well-balanced training dataset. (a) The pattern of the test classification accuracy. (b) The pattern of the signed distance of the centroid of the positive test dataset to the hyperplane. (c) The pattern of the signed distance of the centroid of the negative test dataset to the hyperplane. (d) The pattern of $\eta$ in \ref{['simple-eta']}. The best test accuracy is achieved at $(\alpha_+,\alpha_-) = (\frac{11}{10},2)$. This point is the smallest distance of the centroid of the positive test dataset to the hyperplane. And, it is contained in the group $\left\{ (\frac{11}{10},2), (\frac{9}{10},2), (\frac{11}{10},0), (\frac{9}{10},0) \right\}$ having the smallest $\eta = \frac{1}{11}$. See Example \ref{['example2']} for more details.
Figure 4: A comparison of performanve between quasi-Newton(L-BFGS) for virtual convex loss, which uses the strong Wolfe condition \ref{['strongwolfeD']} with $c_{II} \in [0.1,0.9]$, and the classic L-BFGS(*), which uses the strong Wolfe condition \ref{['strongwolfeD']} with $c_{II}=0.9$ and the Armijo condition \ref{['armijoC']} with $c_{I}=10^{-4}$. Note that L-BFGS(*) uses the cubic-interpolation-based line search nocedal06schmidt05. For our experiments, we use $12\times12$ SIC models with ${\left|\,c_{\alpha}\,\right|}=1$ and ${\left|\,\alpha_{\pm}-1\,\right|}=\frac{1}{k}$, where $k=1,2,3,4,5,6$. (a) Mean test accuracy of $12 \times 12$ SIC models. (b) Maximum test accuracy and (c) minimum test accuracy obtained by an SIC model with fixed $\alpha_{\pm}$ for each $c_{II}$. (d) Test accuracy of TOP$1$ for each $c_{II}$. (e) Total computation time of $12 \times 12$ SIC models. Here, we report the average values of five times repeated experiments with all datasets in Table \ref{['2classimb']} and \ref{['mclassimb']}. For $0.1 \le c_{II} \le 0.5$, in terms of mean test classification accuracy in (a), quasi-Newton(L-BFGS) for virtual convex loss outperforms L-BFGS(*), for each $m$ of two loop iterations.
Figure 5: Graphs of classification performance of the SIC model \ref{['linmin']} with $\alpha=\alpha_-=\alpha_+ \in [0,2]$. (a) Test accuracy($\%$) vs. ${\left|\,c_{\alpha}\,\right|}$. The test accuracy is the average of all results obtained with $\alpha$ in \ref{['setofalpha']}. When ${\left|\,c_{\alpha}\,\right|} = 2$, the best performance is achieved. (b) Test accuracy($\%$) vs. $\alpha$. When $\frac{1}{5} \le {\left|\,\alpha-1\,\right|} \le 1$, the best performance is achieved with ${\left|\,c_{\alpha}\,\right|} = 1$. However, when ${\left|\,\alpha-1\,\right|}< \frac{1}{5}$, the SIC model with ${\left|\,c_{\alpha}\,\right|} \ge 2$ shows better performance.
...and 4 more figures

Theorems & Definitions (21)

Theorem 1.1
proof
Definition 2.1: SIGTRON
Theorem 2.2
Corollary 2.3
proof
Remark 2.4
Example 2.5
Definition 3.1
Lemma 3.2
...and 11 more

An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification

TL;DR

Abstract

An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (21)