The convergence of a 22-layer large model. We use ReLU as the activation for both cases. Both our initialization (red) and “Xavier” (blue) [7] lead to convergence, but ours starts reducing error earlier.
The convergence of a 22-layer large model. We use ReLU as the activation for both cases. Both our initialization (red) and “Xavier” (blue) [7] lead to convergence, but ours starts reducing error earlier.
If you want to change selection, open document below and click on "Move attachment"
pdf
owner: ronaldokun - (no access) - Delving into Rectifiers.pdf, p5
Summary
status
not learned
measured difficulty
37% [default]
last interval [days]
repetition number in this series
0
memorised on
scheduled repetition
scheduled repetition interval
last repetition or drill
Details
No repetitions
Discussion
Do you want to join discussion? Click here to log in or create user.