Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I doubt it was obvious scaling up would magically work. I suspect the experiments were limited for analytic simplicity rather than computational.


The only ML that I ever did was a single undergrad NN class around ~2001. That was a long time ago, but I vaguely remember being taught at that time that adding more nodes rarely helped, that you were just going to overfit to your dataset and have worse results on items outside the dataset, or worse end up with a completely degenerate NN - eg that best practice was to use the minimum number of nodes that would do the job.



The modern slow-but-scales way of coding them also wasn't prevalent


Why couldn't mathematical proofs/models have predicted or revealed this to be the case back then?


On the contrary, there was a mathematical proof that one-hidden-layer neural network with nonlinearity is enough to represent any function. Using more than 1 hidden seemed a waste.

https://en.wikipedia.org/wiki/Universal_approximation_theore...


How??




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: