Efficient Convolution Operation

Regular Convolution

Sliding window over the image and do multi-adds on patch level one by one.

Image to Column Convolution

To better utilize CPU cache and the locality of reference, we convert convolution operations into matrix multiplication.

The Locality of Reference (Principal of Locality)

  • Processor tendency to access the same set of memory locations repetitively

  • Temporal locality: reuse of specific data/resources within a short time duration

  • Spatial locality: use of data elements within relatively close storage locations

  • The locality is a predictable behavior, upon which built systems are great candidates for performance optimization. Techniques like caching, memory prefetching, and processor core branch prediction can contribute to the optimization.

Winograd Convolution

Operations with g0 ~ g2 can be precalculated. Then, the convolution operation is saved from 6 MUL operations to only 4, with 1.5 times faster. For 2-d matrix convolutions, let's say 4x4 matrix with 3x3 convolution kernel, it can be saved from 2 * 2 * 9 = 35 MULs to 4 * 4 = 16 MULs with 2.25 times faster.

Reference

Last updated