Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Doesn't ChatGPT probably regurgitate stackoverflow answers? If stackoverflow doesn't exist, ChatGPT probably also gets worse as it doesn't have the data necessary to be able to answer questions.


With full access to github and github issues it does just fine with whatever knowledge it has


You mean the data that Stackoverflow was given freely by millions of people? They have no more moral claim to it than your local park has to the content of conversations spoken within its limits.

Besides, Github, Gitlab, Bitbucket, etc. have far more public data to train on, Stackoverflow only really provides the schema one would want for fine tuning to make sure the answers are in the right general question-code-answer-code format.

Once that is trained properly (see: to the understanding level of compression, not just memorization) you only need to update the model on new libraries and languages, which can be trained from repos and docs directly.


They provided the platform and the medium by which questions get answered. That’s not free, and expecting to get it for free isn’t realistic


Plus they already provide the data powering the site(plus the site code itself I think?) with a lenient CC license. What more can we ask for? I don't get why people are so hell bent over screwing over a good citizen.


Getting things for free isn't just realistic these days, it's genuinely become the norm. Competition is infinite and there's always somebody willing to bleed VC capital to offer things for free until they finally attempt proper monetization and everyone jumps ship to the next one that's earlier in the cycle. The one unicorn that does successfully monetize then subsidises all the rest through further VC funding in a roundabout startup UBI of sorts.

Besides, Stackoverflow runs ads and profits directly from users' posts. You don't get to own things just by the virtue of being a platform that lets people host things. Does Github own all the people's code they host? Does a cloud service own the software people put in their containers? Of course not. Stackoverflow themselves state that each answer is the property of the poster, licensed as CC BY-SA. Even Reddit's TOS states that "You retain the rights to your copyrighted content or information that you submit to Reddit".

Now whether there should be a class action lawsuit including everyone who's ever posted on Reddit or Stackoverflow against OpenAI is another story.


People answered for free because they had a CC license that ensured that if they tried something evil then the comunity would be free to fork it.

They ran adds to cover the servers and development cost. A fork must find how to get money.


It could interpolate from the original source - the full code of open source codebases and their docs.

It'd only get worse in cases where only a human had access to the knowledge or the means of attaining that knowledge.


Remove stackoverflow from the training and revise your assumptions. ChatGPT is not connecting the dots based on source code and docs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: