The formation of a convolutional layer: feature detectors in computer vision

Convolutional layers of artificial neural networks are analogous to the feature detectors of animal vision in that they both search for pre-defined patterns in a visual field. Convolutional layers form during network training, while the question how animal feature detectors form is a matter of debate. I observed and manipulated developing convolutional layers during training in a simple convolutional network: the MNIST hand-written digit recognition example of Google’s TensorFlow.

Convolutional layers, usually many different in parallel, each use a kernel, which they move over the image like a stencil. For each position the degree of overlap is noted, and the end result is a map of pattern occurrences. Classifier network layers combine these maps to decide what the image represents.

The reason why convolutional layers can work as feature detectors is that discrete convolution, covariance and correlation are mathematically very similar. In one dimension:

Multiply two images pixel-by-pixel, then add up all products to convolve them: $$(f * g)[n] = \sum_{m=-\infty}^\infty f_m g_{(n – m)}$$

Subtract their means E(f) first to instead get their covariance: $$\mathrm{cov} (f,g)=\frac{1}{n}\sum_{m=1}^n (f_m-E(f))(g_m-E(g))$$

Then divide by the variances σ(f) to get their Pearson correlation: $$\mathrm{corr}(f,g)={\mathrm{cov}(f,g) \over \sigma_f \sigma_g} $$

Continue reading