MADGRAD: A high performance deep learning optimizer
I've just open sourced an implementation of the MADGRAD optimizer that I developed together with Samy Jelassi. It out-performs Adam on every problem I've tried it on, and it has generalization performance comparable to SGD, avoiding the overfitting problems of adaptive methods entirely!
Check it out here: https://github.com/facebookresearch/madgrad