> The improvements are usually in the 1% range, from old models to new models, and the models are complex. More often than not also lacking code, implementation / experiment procedures, etc. [...] no idea if the paper is reproducible, if the results are cherry picked from hundreds / thousands of runs, if the paper is just cleverly disguised BS with pumped up numbers to get grants, and so on.
Personally, I'm bearish about most deep learning papers for this reason.
I'm not driven by a particular task/problem, so when I'm reading ML papers, it is primarily for new insights and ideas. Correspondingly, I prefer to read papers which have new perspectives on the problem (irrespective of whether they achieve SOTA performance). From what I've seen, most of the interesting (to me) ideas come from slightly adjacent fields. I care far more about interesting & elegant ideas, and benchmarks to just sanity-check that the nice idea can also be made to work in practice.
As for the obsession with benchmark numbers, I can only quote Mark Twain: “Most people use statistics like a drunk man uses a lamppost; more for support than illumination.”
Personally, I'm bearish about most deep learning papers for this reason.
I'm not driven by a particular task/problem, so when I'm reading ML papers, it is primarily for new insights and ideas. Correspondingly, I prefer to read papers which have new perspectives on the problem (irrespective of whether they achieve SOTA performance). From what I've seen, most of the interesting (to me) ideas come from slightly adjacent fields. I care far more about interesting & elegant ideas, and benchmarks to just sanity-check that the nice idea can also be made to work in practice.
As for the obsession with benchmark numbers, I can only quote Mark Twain: “Most people use statistics like a drunk man uses a lamppost; more for support than illumination.”