Publication | Open Access
Group $L_{1/2}$ Regularization for Pruning Hidden Layer Nodes of Feedforward Neural Networks
27
Citations
36
References
2019
Year
Redundant WeightsSparse RepresentationEngineeringMachine LearningSparse Neural NetworkFeedforward Neural NetworksComputer ScienceAbsolute Value FunctionNeural NetworksRegularization (Mathematics)
A group L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/2</sub> regularization term is defined and introduced into the conventional error function for pruning the hidden layer nodes of feedforward neural networks. This group L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/2</sub> regularization method (GL <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/2</sub> ) can prune not only the redundant hidden nodes but also the redundant weights of the surviving hidden nodes of the neural networks. As a comparison, the popular group lasso regularization (GL <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> ) can prune the redundant hidden nodes, but cannot prune any redundant weights of the surviving hidden nodes, of the neural networks. A disadvantage of the GL <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/2</sub> is that it involves a non-smooth absolute value function, which causes oscillation in the numerical computation and difficulty in the convergence analysis. As a remedy, the absolute value function is approximated by a smooth function, resulting in a smooth group L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/2</sub> regularization method (SGL <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/2</sub> ). Numerical simulations on a few benchmark data sets show that, compared with GL <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> , SGL <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/2</sub> can achieve better accuracy and remove more redundant nodes and weights of the surviving hidden nodes. A convergence theorem is also proved for SGL <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/2</sub> .
| Year | Citations | |
|---|---|---|
Page 1
Page 1