I doubt it was obvious scaling up would magically work. I suspect the experiment...

pavon · on June 19, 2023

The only ML that I ever did was a single undergrad NN class around ~2001. That was a long time ago, but I vaguely remember being taught at that time that adding more nodes rarely helped, that you were just going to overfit to your dataset and have worse results on items outside the dataset, or worse end up with a completely degenerate NN - eg that best practice was to use the minimum number of nodes that would do the job.

peterfirefly · on June 19, 2023

"Optimal Brain Damage", 1989:

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...

mhh__ · on June 19, 2023

The modern slow-but-scales way of coding them also wasn't prevalent

Solvency · on June 19, 2023

Why couldn't mathematical proofs/models have predicted or revealed this to be the case back then?

nurbsnn · on June 19, 2023

On the contrary, there was a mathematical proof that one-hidden-layer neural network with nonlinearity is enough to represent any function. Using more than 1 hidden seemed a waste.

https://en.wikipedia.org/wiki/Universal_approximation_theore...

actionfromafar · on June 19, 2023

How??