Tags

#deeplearning #initialization

Question

After the success of CNNs in IVSRC 2012 (Krizhevsky et al. (2012)), initialization with Gaussian noise with mean equal to zero and standard deviation set to 0.01 and adding bias equal to one for some layers become very popular.

Why it is not possible to train very deep networks from scratch with this initialization?

Answer

