Batch Normalization (BN)
• Normalizing input (LeCun et al 1998 “Efficient Backprop”)
• BN: normalizing each layer, for each mini-batch
• Greatly accelerate training
• Less sensitive to initialization
• Improve regularization
S. Ioffe & C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML 2015
'Deep Residual Learning for image recognition', presents that do batch normalization before active function will be better.