Table of Contents
Fetching ...

Hybrid Machine Learning techniques in the management of harmful algal blooms impact

Andres Molares-Ulloa, Daniel Rivero, Jesus Gil Ruiz, Enrique Fernandez-Blanco, Luis de-la-Fuente-Valentín

TL;DR

A comparison between hybrid machine learning models like Neural-Network-Adding Bootstrap (BAGNET) and Discriminative Nearest Neighbor Classification (SVM-KNN) when estimating the state of production areas is made and BAGNET outperforms the other models both in terms of results and robustness.

Abstract

Harmful algal blooms (HABs) are episodes of high concentrations of algae that are potentially toxic for human consumption. Mollusc farming can be affected by HABs because, as filter feeders, they can accumulate high concentrations of marine biotoxins in their tissues. To avoid the risk to human consumption, harvesting is prohibited when toxicity is detected. At present, the closure of production areas is based on expert knowledge and the existence of a predictive model would help when conditions are complex and sampling is not possible. Although the concentration of toxin in meat is the method most commonly used by experts in the control of shellfish production areas, it is rarely used as a target by automatic prediction models. This is largely due to the irregularity of the data due to the established sampling programs. As an alternative, the activity status of production areas has been proposed as a target variable based on whether mollusc meat has a toxicity level below or above the legal limit. This new option is the most similar to the actual functioning of the control of shellfish production areas. For this purpose, we have made a comparison between hybrid machine learning models like Neural-Network-Adding Bootstrap (BAGNET) and Discriminative Nearest Neighbor Classification (SVM-KNN) when estimating the state of production areas. The study has been carried out in several estuaries with different levels of complexity in the episodes of algal blooms to demonstrate the generalization capacity of the models in bloom detection. As a result, we could observe that, with an average recall value of 93.41% and without dropping below 90% in any of the estuaries, BAGNET outperforms the other models both in terms of results and robustness.

Hybrid Machine Learning techniques in the management of harmful algal blooms impact

TL;DR

A comparison between hybrid machine learning models like Neural-Network-Adding Bootstrap (BAGNET) and Discriminative Nearest Neighbor Classification (SVM-KNN) when estimating the state of production areas is made and BAGNET outperforms the other models both in terms of results and robustness.

Abstract

Harmful algal blooms (HABs) are episodes of high concentrations of algae that are potentially toxic for human consumption. Mollusc farming can be affected by HABs because, as filter feeders, they can accumulate high concentrations of marine biotoxins in their tissues. To avoid the risk to human consumption, harvesting is prohibited when toxicity is detected. At present, the closure of production areas is based on expert knowledge and the existence of a predictive model would help when conditions are complex and sampling is not possible. Although the concentration of toxin in meat is the method most commonly used by experts in the control of shellfish production areas, it is rarely used as a target by automatic prediction models. This is largely due to the irregularity of the data due to the established sampling programs. As an alternative, the activity status of production areas has been proposed as a target variable based on whether mollusc meat has a toxicity level below or above the legal limit. This new option is the most similar to the actual functioning of the control of shellfish production areas. For this purpose, we have made a comparison between hybrid machine learning models like Neural-Network-Adding Bootstrap (BAGNET) and Discriminative Nearest Neighbor Classification (SVM-KNN) when estimating the state of production areas. The study has been carried out in several estuaries with different levels of complexity in the episodes of algal blooms to demonstrate the generalization capacity of the models in bloom detection. As a result, we could observe that, with an average recall value of 93.41% and without dropping below 90% in any of the estuaries, BAGNET outperforms the other models both in terms of results and robustness.
Paper Structure (11 sections, 6 equations, 4 figures, 7 tables)

This paper contains 11 sections, 6 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Map of Galicia and the location of the production areas studied marked in yellow.
  • Figure 2: A bootstrap aggregated neural network. This method is based on resampling with replacement of the available dataset and training an individual network on each resampled instance of the original dataset.
  • Figure 3: A discriminative nearest nighbor classification. To classify an instance, the kNN algorithm is applied to select the $k$ instances from the training set of closest resemblance. Once these are available, a linear SVM is trained with them, and the instance to be classified is applied to this SVM. Once this is done, the SVM is discarded, since it is of local use and is only valid for that instance.
  • Figure 4: Comparison of accuracy, recall, f1-score and kappa between benchmark models and the best results of SVM-KNN and BAGNET for the prediction of HAB episodes by DSP.