Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree. Knowledge should belong to all of humanity.

But then also don’t be angry at big corporations when they scrape the entire internet.



There's no contradiction in wanting an abolition (or at least substantial curtailment) of copyright while also being upset that mass violations of copyright magically become legal if you've got enough money.

Enforcement being unjustly balanced in favor of the rich & powerful is a separate issue from whether there should be enforcement in the first place—"if we must do this, it should at least be fair, and if it's not going to be fair, it at least shouldn't be unfair in favor of the already-powerful" is a totally valid position to hold, while also believing, "however, ideally, we should just not do this in the first place".


> There's no contradiction in wanting an abolition (or at least substantial curtailment) of copyright while also being upset that mass violations of copyright magically become legal if you've got enough money.

Why can't you just be happy for those few who are lucky enough to be able to violate copyright with no consequences? Yes, I know you'd want everyone to be able to violate copyright, but we're not there yet.


"Why can't we just be happy" that individuals and smaller companies get sued into oblivion over copyright violations, while large AI companies can scrape everyone's data and use it for training and completely ignore copyright while generating code and images and text and music based on all that that displaces the demand for the originals? Is that what you're asking?


Because we’d like the powerful to feel the crunch from bad law rather than get a backdoor, so they have to use their power to change things for everyone instead of just getting it changed for themselves.


More often than not the rich just codify the "backdoor" for themselves in such case. A rich man can buy the $30,000 registered machinegun and pay the $200 NFA stamp and be 100% legal, the poor man who 3d prints a $0.50 of plastic to do the same thing goes to jail for 15 years.


The entities training AI are not anti-copyright, or anti-intellectual-property. If I were to steal their AI models they would sue me into the ground and probably win. Furthermore, even if you are anti-copyright, you probably still don't want your shit scraped by AI trainers since the bots are extremely aggressive, almost like a bona fide DDoS attack.

AI is not an attack on copyright, it is an attempt to replace it with something worse.


You're assuming way too much with "not there yet". The point is the corpos will violate copyright with impunity today, and then in a few years sign a bunch of settlement agreements and pull the ladder up behind them.

I'd love to see copyright slowly become irrelevant, but even with that goal we should expect to see large corpos being the last to stop respecting it.


He might be happy for them and also sad because there is no rule of law.


Its the rule of bribery quite frankly. Name it lobbying and nobody bats an eye on it.


It's not so simple.

There are violations of copyright which are ethically fine, i.e. pirating an old movie to watch.

Then there are violations of copyright which are ethically problematic, i.e. pirating an old movie to sell.

When a big company violates copyright the nature of the violation is always much closer to the latter.


Pirating an old movie to sell is not considered ethically problematic everywhere. In many, many countries on earth pirated DVDs were sold at the marketplace, and no one – buyer or seller – had qualms about it. When the authorities shut down such sales, it was almost entirely because they were being pressured by the USA and a handful of other Western governments, not because the local ethical perspective on this changed.


This genre of comment is so tedious. We aren't talking about everywhere, the FBI is a US agency, the big companies we're discussing have won in US court. This thread is about the US.


The FBI and courts are enforcing the law that exists solely because the Founding Fathers enshrined it, but that says nothing about the ethical views that might exist among Americans. There are plenty of Americans who don’t find selling pirated media ethically problematic and would like to see the kind of marketplace sales and wide use of Bittorrent boxes that people in other countries have enjoyed.


While it's true people are upset at AI companies profiting off of artist creations with no compensation, I know a lot of people are also reacting to how the recent AI companies have been scraping the web. The reason folks are using Anubis and other methods is because unlike Google which did have archiving of sites for a long time (which was actually a great service), these new companies do not respect robots.txt, do not crawl at a reasonable rate (for us, thousands of hits a minute from their botnets - usually baidu/tencent, but also plenty of US IPs), hit the same resource repeatedly, ignoring headers intended to give cache hints, stupidly hitting thousands of variations of a page when crawling search results with no detection that they are getting basically the same thing... And when you ban them, they then switch to residential ranges. It really is malicious.


> AI companies profiting

Are they?


If you boil it down to the AI companies are making money (subscriptions, etc.) based on content they did not pay to produce, then they are profiting from someone else's hard work.


If you boil it down anyone on this planet can access ChatGPT (and other for profit LLMs) for free to answer any question they might have.

Knowledge is shared among humanity at a rapid speed. Everyone benefits.

It’s mind boggling how anyone could be opposed to that.


Do you use google search for work? Then you are profiting from someone else's hard work!


The difference is that when you search on Google - at least before AI overviews - you end up at the source site.

Also Google respects robots.txt. Every site that Google surfaces chose to be in the index.


Thats not entirely true. Google might or might not hide your pages from index. They'll definitely going to scrape it anyway. They also display summarized info from your page (famous "what is scrapping" joke showing wikipedias summary). Finally, you might just get your answer without visiting - just by skimming result description.


Revenue is not profit.


I didn't say it was. I understand that profit = revenue - cost.

I said they're profiting from other people's hard work, a separate concept.


'stealing is fine if you lose money when reselling'


I don't believe I wrote anything of the sort.


profiting != profit


Well, don't we have enough Acme Corporations in the world that were unprofitable and existed purely on VC life support before they killed off all the competition by dumping the prices, and then made them skyrocket to recoup investments and become profitable after becoming monopolists?


People at these companies are receiving a salary to do these things that the person you're responding to is opposed to.

While not all the companies in question may or may not be profiting from these things some of them are, and most if not all of their employees certainly are as well.


I don't care that they scrape my website.

I DO care that nearly 2M different IPs are used to try to pull 42k commits out of a git repo one by one when they could just git clone it ...


I wish the companies would just pay a few technically-competent companies to do the scraping. Pay two so you can check their work, maybe, but let's get past the point in time when dozens (or more?) of companies are all simultaneously hammering the web.


There’s perfectly good LLM’s built specifically to shit out swaths of mediocre code to do that, why would you pay anyone?


My pie in the sky pitch is the US Government (and others) should solve this, the legality and the compensation problems in a single swoop. Make submission of your work to a federal model data set a requirement for obtaining copyright protection. License the data set (and heck maybe even charge for making custom models) for nominal fees to anyone who want it, with indemnification against copyright lawsuits for works deriving from the licensed model. Pay copyright owners a limited time royalty from these licensing fees. Everyone wins and we can stop needing a billion bots scraping a billion sites billion times a day.


While I would like to see it abolished entirely (including patents) I do have to compliment how you've described a formula that is actually possible to implement.

To deny people access to things is one thing, wanting to do it by impossible means is quite something else. Who even has time to scavage the universe looking for possible infringement on their works and also the money to deal with it?


A lot of the outrage isn't at scraping, it is at the disruptive techniques used to do so. Like web-scraping whole websites that already provide convenient images of their content for download.


Feels like now we're just redefining our rules so that the people we don't like are out and the people we like are in. Does the content creator have the right to determine how their work is used or not?


I have a right to my copyrighted work, and I also have a right to set and enforce access rules to a server I operate to grant people access to it.


This is a false equivalency I'm surprised no one else has brought up. An archive of a site preserves attribution inherently, the scraping and training are not.


Is it? I thought it was ridiculous at first, but the more I think of it... both are scenarios where a corporation is scraping billions of webpages. We like the reason archive.is does it, but unless it's some kind of charity, I think it's a reasonable comparison.


archive.is is a charity no? Or at least they take donations, it seems the legal entity behind it is nebulous, but they don't have ads and have no paid product or offering.


They sure as shit do have ads. Have you ever accidentally followed a link using a browser profile that has no ad blocking enabled?

I only rarely browse without some form of content blocking (usually privacy-focused... that takes care of enough ads for me, most of the time). I keep a browser profile that's got no customizations at all, though, for verifying that bugs I see/want to report are not related to one of my extensions.

Every once in a while, I'll accidentally open a link to a news site (or to an archive of such a site) in that vanilla profile. I'm shocked at how many ads you see if you don't take some counter measures.

I just confirmed in that profile: archive.is definitely puts ads around the sites they've archived.


I stand corrected, maybe it's because I have ad-blocks that I never noticed.

And arguably I used to think it was the Internet Archive.

It does make this case seem problematic now that I know the details.


So if OpenAI or <AI scraper of the day> adds attribution to their AI-generated answers, everything is OK?


It would be closer to okay.


Copyright exists to "promote the Progress of Science and useful Arts."

Anything which does that should be legal, and anything that stifles those advances should not.


It's not that they're scraping the internet, it's that they're scraping the internet, profiting off the data they take, and still using the copyright regime to go after others who do unto them.


Big corporations aren't humans.


Corporations large and small don't do anything. It's always a person. The question you are answering, even if you don't think you are, is whether a few people can get together and act in concert and still retain their rights.


they are persons under US law.


Only in a couple of very specific and narrow ways. They are not considered persons generally under US law. They are legal fictions that have been granted a subset of rights that people have.


And that subset of rights keeps expanding.


I imagine there's a whole lot of snarky epitaphs which the remnants of the humankind could place on this civilization's gravestone, but citing this exact law might make for the best one.


Then one should be able to put them on death row under US law.


US law only applies in the US. Plus, the company in question seems to be based in Canada, so outside the FBI jurisdiction


> US law only applies in the US.

Where US law applies varies by which law it is; there are US laws that apply only outside of the US [0], as well as US laws which have application both inside and outside the US.

[0] e.g., the federal torture statute, 18 U.S. Code § 2340A(a), “Whoever outside the United States commits or attempts to commit torture shall be fined under this title or imprisoned not more than 20 years, or both, and if death results to any person from conduct prohibited by this subsection, shall be punished by death or imprisoned for any term of years or for life.”

https://www.law.cornell.edu/uscode/text/18/2340A


that's just wishful thinking. US law applies world wide as long as Trump is willing to reach out and nab you. ask the Venezuelan fishermen.


That's not even US law, it's straight up murder outside of the law.


So was Kim Dot Com. Biden went after him anyway at the behest of big media.


I mean, its not like it was just Biden. His extradition proceedings took place during three different US presidential administrations. You might as well include Trump and Obama in there as well.


People pretend the GDPR applies to everyone, so why not the DMCA?


GDPR applies to you if you come to the EU to peddle your wares.

DCMA absolutely does apply to European firms selling their goods and services in the US.


If you're looking to US law to discern who is a person and who is not you are deeply lost.


the argument really wasn't about human persons, it was about legal persons. the distinction was only brought up to derail the conversation.


But US law isn't even the law of the world, let alone the definition of reality.


Does it need to be? Trump has been assasinating people in boats with no evidence whatsoever.


How are you planning on doing anything about it?


It would solve a lot if that was taken to the extreme. Sorry Amazon, but your working conditions killed five people. Your business licens is going to jail for 40 years, good luck getting contracts with other companies with murder on your records when you get out.


One thing is not the other. A corporation is not a human (and no I don't care what Citizens United says). A corporation has no inherent rights.


> But then also don’t be angry at big corporations when they scrape the entire internet.

I'm only angry with them when they pay hush money to IP extortionists.


Well, as long as they pursue a "copyright for me but not for thee" regime, you can.


Hot take here, I know, but some of us believe the law should treat large corporations differently than it treats individuals when it comes to their rights and privileges.


This seems like an incredible disingenuous take. There's a marked difference between collecting information to freely share with the rest of humanity, and collecting information to feed into algorithms under the guise of "artificial intelligence" with the pretense of enriching their finances and putting others out of work.


Anyone on this planet can access ChatGPT (and other for profit LLMs) for free to answer any question they might have.

This is true knowledge socialism.


- ChatGPT is not about knowledge.

- ChatGPT is in the "bait" phase of "bait and switch" plan. It is trying to make us dependent on it, so that it can extract maximum profit later.


we (all of us) do not own chatgpt; we (all of us) do not share in the profit from chatgpt; this is not what socialism is.


That's a bad take, just like open source code is available to all, it's not the case you can always resell it or repackage it for your own profit.

Information can be made available to all, and at the same time, we can make it so others cannot resell or repackage it for profit like what AI companies are doing.


You can sell open source code for profit.

https://www.gnu.org/philosophy/selling.en.html


You linked to the meaning of free software. I said open source.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: