Yeah, Google violates the CFAA and infringes on copyright as a matter of course. Their service would be impossible if they weren't doing so.
The main difference when Google was small was that Google was not dependent on any data source in particular, so even if someone denied their robot or sued them, they could cease and desist without affecting the overall value of their offering. This is different if you are getting data that is only available from one or two sources.
Now, the main difference is that Google is one of the biggest companies in the world, and they'll sick an army of $1,000/hr lawyers on you if you even think about taking legal action against them. The only people who can afford to fight are other big companies, but that's not going to happen because they all depend on breaking the CFAA for their own purposes and then using their position as a huge company to bully small innovators.
Incidentally, this only further proves my point. If you're a big company that's retained massive law firms, you can successfully raise a fair use and implied license defense. If you're not, you can neither mount a strong offense against that defense nor mount a strong defense against Google's hypocritical offense if you find yourself on the other side.
Google's primary out here is its reputation (not guarantee) for obeying robots.txt. If Google indexed a page that disallowed it in robots.txt, the case would be much stronger. There's also the unofficial out, which is that judges think Google is a cool large company, so they rule in their favor based on their personal biases.
Fair use is a case-by-case basis, so you can't say that Google's infringing conduct is generally accepted to be fair use. The EFF had to take on Universal in Lenz v. Universal Music Group, and that went up to the Supreme Court. That's how individuals are left to assert their fair use rights.
You claimed "[Google] infringes on copyright as a matter of course" despite the many real civil cases (previously cited) which have found these very activities to be non-infringing. And then, strangely you claimed:
>Fair use is a case-by-case basis, so you can't say that Google's infringing conduct is generally accepted to be fair use.
There is so much wrong with this statement. For one, how can you call something infringing at the same time you point out that nothing has been proven? That simply defies all common logic.
Secondly, in general terms, the activities in question have been found to be non-infringing by the courts. Sure fair-use is case-by-case but if you're operating within similar parameters as a previously litigated case, then the legal risk is immensely reduced.
I don't disagree with your assertion that the legal system greatly favours the well monied/connected (I don't think anyone would). But you can't claim it to be fact that Google Search is infringing anything with little to no evidence or rulings to cite. Unless you're just stating an opinion in which case you should clearly indicate that.
First, IANAL, so my use of some terms may be loose. I never intend to convey more than an informed layman's opinion. However, I do love it when I'm corrected so that my usage can improve.
Fair use is an affirmative defense. Google admits that it copies content without legal license to do so, but claims that said copies are non-infringing under fair use exemptions. I guess you're probably correct that it's no longer appropriate to refer to Google's behavior specifically as "infringing", just "copying without authorization", which, for those of us without $5 million to commit to a legal team, means "infringing". I will try to remember the special standard of law which has been allowed to Google and refer to their copying only as "unauthorized" and not "infringing" in the future.
If you review the points summarized in the Wikipedia articles you helpfully linked, you'll see that Google's defense is mostly "Yeah, but we're Google".
In Field, "the court found that the plaintiff had granted Google an implied, nonexclusive license to display the work because of Field’s failure in using meta tags to prevent his site from being cached by Google.", i.e., because Field already knew Google existed and knew there was a standard way to prevent its access but chose not to employ it, he gave Google an implied license.
Who else does that work for? Can I send an email to Netflix and tell them "Hey, if you don't want me to copy your shows, please add this in your page's HEAD element: <meta name='please-dont-download-my-shows-sir'>"? No?
I understand there are other criteria which were used to decide if Google's use was specifically infringing in addition to the implied license. Just demonstrating that Google is getting favored treatment from the judiciary that would not be available to a normal entity.
In Perfect 10 [0], the judge even explicitly indicated that he was loathe to find Google's use of thumbnails infringing because he didn't want to "impede the advance of internet technology", but that he felt the law obligated him to do so (his ruling in that matter was overturned on appeal, when the Ninth Circuit found Google's usage non-infringing). What if the defendant had been some company perceived as less technically advanced than Google? This is probably as close as you can get to an explicit statement of favoritism. The Ninth Circuit also rejected Perfect 10's claim that RAM copies were infringing (which was not the case with an unlucky non-Google company discussed further down).
What if I started indexing and rehosting thumbnails? I can assure you that I would get C&D'd almost immediately and I would be forced to shut down because I can't afford to pay lawyers for 3 years while the case works through the system (and to be honest, I'm surprised it only took 3 years). And even if I could, with a reputation less sterling than Google's, there's no reason to believe that a judge would rule in the favor of one useless guy instead of a big company. A judge would look at the case and say "Google's use was fair because it provided a public service [actually cited as part of the justification in most of your linked cases], but this guy is just using it for a few hundred people, it's definitely unfair, he owes that company more money than he'll make in his life, case dismissed".
There are many such cases on the books. I don't know if Google has a direct connection to the reptilian overlords or what, but it seems in most cases where they're not involved, the good side loses.
In Craigslist v. 3Taps, while primarily a CFAA case, 3Taps was found to be infringing copyrights by sampling Craigslist postings in order to allow its clients to plot them on a map. Being a "public service" or a "referential use" didn't matter for them. They were raked over the coals, and it's been that way with most cases.
In Ticketmaster v. RMG Technologies [1], RMG was found to infringe just by parsing a page. "Defendant's direct liability for copyright infringement is based on the automatically-created copies of ticketmaster.com webpages that are stored on Defendant's computer each time Defendant accesses ticketmaster.com. [...] Defendant contends [...] that such copies could not give rise to copyright liability because their creation constitutes fair use[.] [...] Defendant's fair use defense fails."
The case specifically discusses how, despite the precedent in Perfect 10, since the Defendant is not Google, it is bound by a site's Terms of Use and copyright law, and RAM copies, which are specifically non-infringing for Google, were infringing for RMG.
Very similar findings were made in Facebook v. Power Ventures, and the founder was left holding a bag of $3 million in personal liability.
This is a thread about the legality of HN users scraping. It seems Google is the only entity capable of making unauthorized copies and then getting courts to agree that it's fair use. For the rest of us, it's infringement, which carries stiff penalties (and this doesn't even broach the CFAA portion of the issue).
So when I say "infringing", I mean something that would be considered infringing if you aren't Google. It's apparently only infringement if the judges involved don't personally use your site and don't have to worry about personally suffering the consequences of not having access to it. :)
What you've failed to mention is the criteria used to determine if a usage is indeed "fair". There are 4 basic criteria[0] but can be summarized as "If the usage doesn't affect the market for the original work, is substantially transformative, is proportionally insignificant or is used for critique/parody then it is fair". Or, at the risk of over simplifying it: "Does the usage grant a net public benefit without significantly hurting the copyright holders ability to make money?".
>Can I send an email to Netflix and tell them "Hey, if you don't want me to copy your shows, please add this in your page's HEAD element: <meta name='please-dont-download-my-shows-sir'>"?
Actually, under fair use you certainly can make a personal copy (see Betamax case). If you distribute the work you would likely run afoul of the criteria summarized above.
The robots.txt relevancy is being over stated in your argument. The main criteria used in this case is summarized above. The fact that Google provides an opt-out mechanism is a secondary, supporting argument.
>What if I started indexing and rehosting thumbnails? I can assure you that I would get C&D'd almost immediately
A determination of infringement would depend entirely on the context as related to the afore mentioned criteria. The fact that someone might try to sue is a product of the terrible system in general and you're absolutely right - as with any legal matter the entity with the deeper pockets can often bully the other guy into submission.
>In Craigslist v. 3Taps, while primarily a CFAA case, 3Taps was found to be infringing copyrights
My understanding is that the copyright part of the case was thrown out [1] and thus was settled solely around CFAA matters.
>In Ticketmaster v. RMG Technologies , RMG was found to infringe just by parsing a page.
I agree that the logic used for the judgement is absurd (for reasons that are plainly obvious to any HN user). But it's less clear whether the case would meet fair use criteria outlined above should it have come to that. My guess is that it wouldn't qualify since the usage affects the copyright holders ability to make money on the work and doesn't meet any of the other criteria for Fair Use.
>Facebook v. Power Ventures
This is not a case involving a defense of fair use (as far as I can tell). Facebook even acknowledged the users owned the data and had a right to it. The defendant was actually found to be violating CFAA and CAN-SPAM acts.
>It seems Google is the only entity capable of making unauthorized copies and then getting courts to agree that it's fair use. For the rest of us, it's infringement
Provably false [2]. It sounds like perhaps your personal experience has soured your opinion on the matter? That's understandable. But none of the evidence you've cited supports the argument that Google is infringing copyrights in its core activities nor that Google is the only entity where copyright laws and fair use legislation don't apply.
PS: To be clear, my argument revolves specifically around copyright infringement and fair use. I don't have enough understanding of other, separate legislation like CFAA to comment on that except to say that it seems overly broad and unrealistic. But that's another topic. I'm specifically arguing against calling Google a copyright infringer in a broad sense which is what you've done. That's not been proven.
>What you've failed to mention is the criteria used to determine if a usage is indeed "fair".
Yes, I understand that the criteria for fair use is defined in the statute. What I'm saying is that like most things brought before judges, arguments can be made either way, and judges seemingly favor Google but not smaller defendants. Thus, while the RAM copies of web pages made by Google are fair use, those made by RMG aren't.
If you look at the Ninth Circuit's ruling in Perfect 10, the length they stretch to reverse the District Court's finding of thumbnails as infringing is ridiculous. It's pretty clear that thumbnails are direct infringements and that you don't invalidate the copyright or create a truly "transformative use" by making it smaller and adding it to an index. Perfect 10 was certainly of this opinion, and I'm sure they saw a real impact to their revenue.
Over the years I've learned that no position is too high to disregard the human factor. 99% of the time people are going to act primarily to their own benefit and work backwards to find rational (or rational-sounding) arguments to justify it. Judges are politicians and they're very image-conscious. None of them wants to be the one to make Google Image Search useless.
You seem to be saying that since Google's use was found non-infringing in these cases, its use is objectively non-infringing. I don't agree with this. Rather, I think that Google's conduct is a pretty plain violation of the relevant statute(s) and that most of it is not covered under fair use, the way the laws are currently written. I think that judges apply the statute in full force when smaller defendants present, but that they have a bias for Google (which is really a bias for themselves, since they know that serious backlash awaits the judge who puts the kabosh on it) that causes them to contort the law pretty heavily so that they can rule the way they want to.
>Actually, under fair use you certainly can make a personal copy (see Betamax case).
See, we were on the right track before we got into networks. Since then, the rulings have been pretty darn bad. The modern "Betamax case" may well have been American Broadcasting Cos. v. Aereo, Inc. [0], and it wasn't a win for us.
Note also that separate from the copyright concern, the DMCA makes it illegal to circumvent a copy protection device (or indeed, even to teach another how to do so). Since Netflix employs DRM, even if there is a fair-use right to a copy of a Netflix program (which is by no means certain), you'd probably have to break the DMCA to obtain it.
>The robots.txt relevancy is being over stated in your argument. The main criteria used in this case is summarized above. The fact that Google provides an opt-out mechanism is a secondary, supporting argument.
I disagree. Google has been able to discharge all CFAA claims because the judges have said "Well, you knew there was a way to stop it." If that's the logic, I'll happily inform the parties I may scrape that there's a way to stop it.
>A determination of infringement would depend entirely on the context as related to the afore mentioned criteria.
Yes, I understand that the judge would write a report that appeared to consider the relevant criteria. The real question is, would that judge be willing to make the same logical contortions that other judges have made for Google?
I think that he would just go in favor of his biases, and right now we have a judiciary that is heavily biased against the little guy from the start, and this is only exacerbated by an inability to retain hotshot lawyers.
>My understanding is that the copyright part of the case was thrown out and thus was settled solely around CFAA matters.
The only portion of the copyright claim that was dismissed was Craigslist's claim that it owned an exclusive license in the scraped content. This was based on a short-lived ToU update that was specifically intended to strengthen Cragislist's case in this instance. The remaining copyright-related claims were allowed to stand, including a claim that Padmapper had violated a copyright Craigslist holds on the collection of advertisements (rather than on the advertisements themselves). [1]
>[re: RMG] I agree that the logic used for the judgement is absurd (for reasons that are plainly obvious to any HN user).
If you agree the logic was absurd, you agree that a copy of the page that exists in RAM for microseconds does not qualify as a protected copy any more than the reflection of an image on one's retina qualifies. As a "copy" that should be ineligible for copy protection, it doesn't matter if it qualifies for fair use (and I don't necessarily agree that it wouldn't).
> [re: Facebook v. Power] This is not a case involving a defense of fair use (as far as I can tell).
Correct. I was including it because it's an example of Google getting another free pass for stuff that shuts others down, which is the CFAA. CFAA claims are raised against Google in at least Field and Perfect 10, and they get dismissed based on the judge's assumption that the plaintiff knows about the special steps Google makes you take to stop them from violating the CFAA, the absurdity of which we've already discussed.
My wording that the "findings were very similar" was definitely bad since a different law was in play. I meant they were very similar in nature, not in fact. That said, it's likely the only reason that the cached pages weren't considered infringement is that Facebook didn't bring it up.
>But none of the evidence you've cited supports the argument that Google is infringing copyrights in its core activities nor that Google is the only entity where copyright laws and fair use legislation don't apply.
Again, I'm discussing this from a practical position, not one that is strictly compliant with legal theory, where judges always enforce the law with perfect equity, and in which anything a judge (or jury) finds becomes Official Truth de-facto.
From a textbook perspective, sure, everyone has all the same rights and the legal system is always applied equitably. I simply don't believe that has borne out in practice when it comes to internet-centric companies that aren't household names.
It seems that the things Google does are considered infringement when other people do them. Thus, it behooves to know the actual law and follow it, even if Google gets a free pass, since we can't rely the judiciary to interpret the law favorably for us.
RMG is a great example because it occurred after Perfect 10, and the same argument against RAM copies was raised in both cases. It's apparently fair use if Google scrapes your page to download and rehost all of your images, but it's not fair use to read out non-copyrightable factual data unobtainable from any other source (like ticket prices and event times) and rehost it nowhere. Sure.
The alternate lesson here is to focus on getting really big and powerful really quickly, and making sure you cultivate a good public image, so that judges are afraid to rule against you in ways that would affect a product offering upon which millions of people depend. That seems to have worked for most big internet companies, actually. Definitely worked for Facebook and Google.
> they'll sick an army of $1,000/hr lawyers on you
They don't even need to do that. They just cheerfully agree to not scrape you, and wait for you to come back and beg to be re-instated when your search traffic plummets.
The main difference when Google was small was that Google was not dependent on any data source in particular, so even if someone denied their robot or sued them, they could cease and desist without affecting the overall value of their offering. This is different if you are getting data that is only available from one or two sources.
Now, the main difference is that Google is one of the biggest companies in the world, and they'll sick an army of $1,000/hr lawyers on you if you even think about taking legal action against them. The only people who can afford to fight are other big companies, but that's not going to happen because they all depend on breaking the CFAA for their own purposes and then using their position as a huge company to bully small innovators.