I was listening to talking machines interview(Illya Sutskuver) and he made important point on initialization scale for deep neural network.
It seems too small weights will significantly decay the signal and large would be unstable. This also brings an important point of stability issues of neural nets and connections to eigen value problem and random matrix theory as pointed out by Ryan Adams.