Regularizations

Common Regularization:

Overall, we want a simpler model to get rid of overfitting.

  1. L2 regularization: SUMk SUMl ( Wk l^ 2 )

    • It works in this following way: by using L2 regularization, W [0.25, 0.25, 0.25, 0.25] is more preferred than W [1, 0, 0, 0], so that the decision made would be counting on all 4 input features, and the final guess will have looked into more features rather than only one.

    • L2 regularization also corresponds MAP inference using a Gaussian prior on W

  2. L1 regularization: SUMk SUMl ( | Wkl | ). L1 will force the model to be more sparse

    • In the other way, L1 regularization kind of have the opposite L2 interpretation. And we would prefer W [1, 0, 0, 0] more than W [1, 1, 1, 1].

  3. Elastic net ( L1 + L2 ) : SUMk SUMl ( beta * Wkl^2 + | Wkl |)

  4. Dropout: set random activations zero (for FC layer), or random channels to zero for Convolution layers.

Last updated