On the application of GP to streaming data classification tasks with label budgets

Abstract

A framework is introduced for applying GP to streaming data classification tasks under label budgets. This is a fundamental requirement if GP is going to adapt to the challenge of streaming data environments. The framework proposes three elements: a sampling policy, a data subset and a data archiving policy. The sampling policy establishes on what basis data is sampled from the stream, and therefore when label information is requested. The data subset is used to define what GP individuals evolve against. The composition of such a subset is a mixture of data forwarded under the sampling policy and historical data identified through the data archiving policy. The combination of sampling policy and the data subset achieve a decoupling between the rate at which the stream passes and the rate at which evolution commences. Benchmarking is performed on two artificial data sets with specific forms of sudden shift and gradual drift as well as a well known real-world data set.

References

Page 1

	Year	Citations

Page 1