Concepedia

Publication | Open Access

Variational autoencoder based synthetic data generation for imbalanced learning

150

Citations

21

References

2017

Year

TLDR

Imbalanced data degrade learning algorithms, and while synthetic sampling methods exist, they struggle with high‑dimensional data such as images. The authors propose a variational autoencoder–based synthetic data generation approach to address imbalanced learning. The method uses a VAE to generate novel samples resembling the original data, and its performance is benchmarked against conventional synthetic sampling techniques across multiple datasets and five evaluation metrics. Experiments show that the VAE‑based approach outperforms traditional synthetic sampling methods.

Abstract

Discovering pattern from imbalanced data plays an important role in numerous applications, such as health service, cyber security, and financial engineering. However, the imbalanced data greatly compromise the performance of most learning algorithms. Recently, various synthetic sampling methods have been proposed to balance the dataset. Although these methods have achieved great success in many datasets, they are less effective for high-dimensional data, such as the image. In this paper, we propose a variational autoencoder (VAE) based synthetic data generation method for imbalanced learning. VAE can produce new samples which are similar to those in the original dataset, but not exactly the same. We evaluate and compare our proposed method with the traditional synthetic sampling methods on various datasets under five evaluation metrics. The experimental results demonstrate the effectiveness of the proposed method.

References

YearCitations

Page 1