Bayesian Deep Convolutional Networks with Many Channels are Gaussian\n Processes

Abstract

There is a previously identified equivalence between wide fully connected\nneural networks (FCNs) and Gaussian processes (GPs). This equivalence enables,\nfor instance, test set predictions that would have resulted from a fully\nBayesian, infinitely wide trained FCN to be computed without ever instantiating\nthe FCN, but by instead evaluating the corresponding GP. In this work, we\nderive an analogous equivalence for multi-layer convolutional neural networks\n(CNNs) both with and without pooling layers, and achieve state of the art\nresults on CIFAR10 for GPs without trainable kernels. We also introduce a Monte\nCarlo method to estimate the GP corresponding to a given neural network\narchitecture, even in cases where the analytic form has too many terms to be\ncomputationally feasible.\n Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs\nwith and without weight sharing are identical. As a consequence, translation\nequivariance, beneficial in finite channel CNNs trained with stochastic\ngradient descent (SGD), is guaranteed to play no role in the Bayesian treatment\nof the infinite channel limit - a qualitative difference between the two\nregimes that is not present in the FCN case. We confirm experimentally, that\nwhile in some scenarios the performance of SGD-trained finite CNNs approaches\nthat of the corresponding GPs as the channel count increases, with careful\ntuning SGD-trained CNNs can significantly outperform their corresponding GPs,\nsuggesting advantages from SGD training compared to fully Bayesian parameter\nestimation.\n

References

Page 1

	Year	Citations

Page 1