Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because training in this manner requires good data. And by good I mean manually curated.

The ML hype train relies on "garbage in, good out", popularized by an influential paper published at the turn of the century [1]. Anyone with a modicum of experience in experimental science knows that any experiment is only as successful as its ability to collect good data that is representative of the problem, has manageable noise based on known controlled sources and has sufficient coverage of the problem domain to provide meaningful analysis. But of course, if ML admitted that this was a necessary requirement, it would become yet another optimization technique, admittedly new classes of optimization techniques and that promise that the machine learns something new would fall flat on its face.

[1] https://aclanthology.org/H01-1052.pdf



Sure, I get that part, but in this case it seems like curation (at least to a degree that would avoid the problems that arose) is fairly trivial - just train it on high-quality scientific journals/textbooks/etc., instead of letting it peruse the whole internet (or whatever they did) for scientific data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: