Selecting concise training sets from clean data

TLDR

For clean data with deterministic relationships, concise training sets that minimize integrated squared bias are desired. The study aims to reduce data requirements for learning by deriving a method to select exemplars for training a multilayer feedforward network on clean deterministic data. The authors sequentially choose training examples that maximize the reduction in network squared error, using a criterion derived from integrated squared bias (DISB) that aligns with least‑squares learning. Experimental results and illustrations show that selecting exemplars with the proposed criterion reduces computation while maintaining performance during network training.

Abstract

The authors derive a method for selecting exemplars for training a multilayer feedforward network architecture to estimate an unknown (deterministic) mapping from clean data, i.e., data measured either without error or with negligible error. The objective is to minimize the data requirement of learning. The authors choose a criterion for selecting training examples that works well in conjunction with the criterion used for learning, here, least squares. They proceed sequentially, selecting an example that, when added to the previous set of training examples and learned, maximizes the decrement of network squared error over the input space. When dealing with clean data and deterministic relationships, concise training sets that minimize the integrated squared bias (ISB) are desired. The ISB is used to derive a selection criterion for evaluating individual training examples, the DISB, that is maximized to select new exemplars. They conclude with graphical illustrations of the method, and demonstrate its use during network training. Experimental results indicate that training upon exemplars selected in this fashion can save computation in general purpose use as well.

References

Page 1

	Year	Citations

Page 1