Agreed, this is better than the vast majority of machine learning papers that ac...

Agreed, this is better than the vast majority of machine learning papers that actually get published. The ablation section is particularly nice. It is really a major failing of the field that in most papers, it's entirely unclear what aspect of the model (or which particular hacks) are really carrying the weight.