Batch Norm introduction

1 minute read

Similar to Input Normalization

The Input Normalization we saw earlier adjusts the range (scale) of the input so that the weight update is turned off uniformly. Therefore, it served to make the convergence relatively fast.

Neural Network 적용 전에 Input data를 Normalize 해야 하는 이유

Why use batch normalization?

Batch normalization applies this to the input of the hidden layer. This speeds up convergence and enables faster hyperparameter search.

How to do Batch Norm

Let’s take an example.

Find the mean and variance of the second hidden layer.

\[\Sigma z^{(2)}_{i}\] \[\mu = \frac {\Sigma z^{(2)}_{i}} {m}\] \[\sigma = \frac {\Sigma (z^{(2)}_{i} - \mu )^2} {m}\] \[z_{norm}^{(i)} = \frac {(z^{(2)}_{i} - \mu )}{\sigma }\]

However, this will disable the calculation if \(\Sigma\) becomes zero. So, the calculation is as follows.

\[z_{norm}^{(i)} = \frac {( z^{(2)}_{i} - \mu )} {\sqrt {\sigma^2 + \epsilon} }\]

Practical Application of Batch Norm

if you use \(z_{norm}\) as is , The distribution will follow N(0,1). However, we don’t want all the distributions of z to follow N(0,1). By further developing

\[\widehat{z} = \alpha z_{norm} + \beta\]

this expression is written like above. If you write like this \(\alpha = {\sqrt {\sigma^2 + \epsilon} }, \beta = \mu\) . we use \(\alpha , \beta\) as learnable parameters . So, the mean and variance follow the desired distribution.

Share on

Twitter Facebook LinkedIn

Choi Woongjoon

Batch Norm introduction

Similar to Input Normalization

Why use batch normalization?

How to do Batch Norm

Practical Application of Batch Norm

Share on

Leave a comment

You may also enjoy

Convolution Neural Network : Variation

Vector,Matrix,Linear Transformation

Topological Ordering

String manipulation: Comparison of various solutions and internal implementation