I was going through Hinton's lectures and I found something interesting and wanted to share.
It is a very usual case that magnitude of gradient for different layers are different. The fan-in of a unit determines the size of the “overshoot” effects caused by simultaneously changing many of the incoming weights of a unit to correct the same error. So we can use local adaptive gains $g_{ij}$ for gradients.
So update rule becomes:
\[\Delta w_{ij} = - \epsilon g_{ij} \frac{\partial E}{\partial w_{ij}}\]
How we adjust the gains is by additive increment and multiplicative decrement.
Weight change is current velocity
$$ \Delta w_{ij}(t) = v(t) = \alpha v(t-1) - \epsilon \frac{\partial E}{\partial w_{ij}}(t) = \alpha \Delta w_{ij}(t-1) - \epsilon \frac{\partial E}{\partial w_{ij}}(t)$$
velocity $v(t) = \alpha v(t-1) - \epsilon \frac{\partial E}{\partial w_{ij}}(t)$, here $\alpha$ is slightly less than 1.
Momentum method builds up speed in directions with a gentle but consistent gradient. Use of small initial momntum $\alpha = 0.5$ and later to $\alpha = 0.9$.
It is a very usual case that magnitude of gradient for different layers are different. The fan-in of a unit determines the size of the “overshoot” effects caused by simultaneously changing many of the incoming weights of a unit to correct the same error. So we can use local adaptive gains $g_{ij}$ for gradients.
So update rule becomes:
\[\Delta w_{ij} = - \epsilon g_{ij} \frac{\partial E}{\partial w_{ij}}\]
How we adjust the gains is by additive increment and multiplicative decrement.
if $( \frac{\partial E}{\partial w_{ij}}(t-1) * \frac{\partial E}{\partial w_{ij}}(t) ) > 0 $
then $g_{ij}(t) = g_{ij}(t-1) + 0.05$
Other things to note are:else $g_{ij}(t) = g_{ij}(t-1) * 0.95$
- $g_{ij}$ should be withing some bounds like [0.1,10] or [0.01,100]
- Use of full batch or large mini batches(nothing crazy should happen because of sampling error)
- Use agreement in sign of current gradient and current velocity for that weight.(adaptive learning rates combined with momentum).
Weight change is current velocity
$$ \Delta w_{ij}(t) = v(t) = \alpha v(t-1) - \epsilon \frac{\partial E}{\partial w_{ij}}(t) = \alpha \Delta w_{ij}(t-1) - \epsilon \frac{\partial E}{\partial w_{ij}}(t)$$
velocity $v(t) = \alpha v(t-1) - \epsilon \frac{\partial E}{\partial w_{ij}}(t)$, here $\alpha$ is slightly less than 1.
Momentum method builds up speed in directions with a gentle but consistent gradient. Use of small initial momntum $\alpha = 0.5$ and later to $\alpha = 0.9$.
i enjoyed your Article .if you have facing various issues in QuickBooks .Get instant quickbooks solution issue call at quickbooks customer service
ReplyDeleteQuickBooks is most famous accounting software for managing business process if you have any problems then get instant QuickBooks solution at
ReplyDeletequickbooks customer service
kuşadası
ReplyDeletemilas
çeşme
bağcılar
kastamonu
6887