手写深度学习库 梯度下降 Multi-head Attention Layer Normalization Batch Normaliztion Dropout 参考文献 Written on May 4, 2022