Concepedia

TLDR

Data division into training, testing, and validation sets critically affects ANN performance, yet no systematic approach exists for optimal partitioning. This study introduces two data‑division strategies—genetic algorithm and self‑organizing map—to create representative subsets for ANN modeling. The methods are benchmarked against the conventional arbitrary split by training ANNs to forecast 14‑day salinity in the River Murray, using a case study at Murray Bridge. Using GA and SOM reduced RMS error by 24.2% and 9.9% respectively compared to the conventional split, and SOM additionally revealed that poor ANN performance was primarily data‑driven rather than due to model parameters.

Abstract

The way that available data are divided into training, testing, and validation subsets can have a significant influence on the performance of an artificial neural network (ANN). Despite numerous studies, no systematic approach has been developed for the optimal division of data for ANN models. This paper presents two methodologies for dividing data into representative subsets, namely, a genetic algorithm (GA) and a self‐organizing map (SOM). These two methods are compared with the conventional approach commonly used in the literature, which involves an arbitrary division of the data. A case study is presented in which ANN models developed using each data division technique are used to forecast salinity in the River Murray at Murray Bridge (South Australia) 14 days in advance. When tested on a validation data set from July 1992 to March 1998, the models developed using the GA and SOM data division techniques resulted in a reduction in RMS error of 24.2% and 9.9%, respectively, over the conventional data division method. It was found that a SOM could be used to diagnose why an ANN model has performed poorly, given that the poor performance is primarily related to the data themselves and not the choice of the ANN's parameters or architecture.

References

YearCitations

Page 1