Batch Norm introduction
Similar to Input Normalization
The Input Normalization we saw earlier adjusts the range (scale) of the input so that the weight update is turned off uniformly. Therefore, it served to make the convergence relatively fast.
Why use batch normalization?
Batch normalization applies this to the input of the hidden layer. This speeds up convergence and enables faster hyperparameter search.
How to do Batch Norm
Let’s take an example.
Find the mean and variance of the second hidden layer.
\[\Sigma z^{(2)}_{i}\] \[\mu = \frac {\Sigma z^{(2)}_{i}} {m}\] \[\sigma = \frac {\Sigma (z^{(2)}_{i} - \mu )^2} {m}\] \[z_{norm}^{(i)} = \frac {(z^{(2)}_{i} - \mu )}{\sigma }\]However, this will disable the calculation if \(\Sigma\) becomes zero. So, the calculation is as follows.
\[z_{norm}^{(i)} = \frac {( z^{(2)}_{i} - \mu )} {\sqrt {\sigma^2 + \epsilon} }\]Practical Application of Batch Norm
if you use \(z_{norm}\) as is , The distribution will follow N(0,1). However, we don’t want all the distributions of z to follow N(0,1). By further developing
\[\widehat{z} = \alpha z_{norm} + \beta\]this expression is written like above. If you write like this \(\alpha = {\sqrt {\sigma^2 + \epsilon} }, \beta = \mu\) . we use \(\alpha , \beta\) as learnable parameters . So, the mean and variance follow the desired distribution.
Leave a comment