Hacker Newsnew | past | comments | ask | show | jobs | submit | predius's commentslogin

We've used https://foundershield.com/ for a few years now. They have managed to help us from a small E&O and supplier liability insurance, as well as cyber liability– to a much larger international insurance spanning multiple organizations.


Brittany Pettibone offered evidence that Twitter is shadow-banning her, here:

https://twitter.com/BrittPettibone/status/797186228894322688


So that's weird. I looked at her feed, didn't see anything beyond the pale.

On the other hand, I can't quite replicate her search. When I search, I get more results than she does, maybe not as many as I'd expect, but importantly, not the same as her. I can't say "obviously there's nothing here" but I also don't think there's a smoking gun.

If you're concerned about this, you really should not be pointing me to one tweet by one user. You should have reams of evidence, documented, with a timeline, with comparisons to what other users see, etc, etc. If you don't care about convincing people who haven't already bought in, that's ok, but if you do want to persuade, you're going to have to provide something better than this one tweet. (Case in point, downthread, we have a person who's convinced they were shadowbanned, then all of a sudden they check, and they aren't: https://news.ycombinator.com/item?id=12935624).


San Francisco has just under that– around 73%.

https://www.quora.com/What-percentage-of-San-Francisco-apart...


And just to clarify, it's 73% of the rental market. If you build new housing and it's all owner occupied, they won't budge that 73% number.

It might not be a huge factor but it's something to keep in mind.


You're forgetting that the price of any new owner-occupied property must also price-in the opportunity cost of not renting out that property. Nobody buys a house when they could get rich by renting out the same house to someone else. Indeed a foreign investor is likely to buy it, because it's a great source of income, and they already have somewhere to live.

If I can rent a property out for thousands of dollars a month, that property is worth a lot of money. So it's really the rental market that is pushing up prices in the owner-occupied market.


Lots of people buy real estate in can Vancouver and leave it empty instead of renting it out. It is a safe place to park money and the hassle and wear of tenants isn't worth the money they pay when the market has been going up so much.

I've heard there is a city in Australia that also has loads of empty condos bought by investors... Surfers maybe?

Buying a house and not living in it or renting it is definitely a thing


this happens in miami ALL the time. tens of thousands of empty apartments owned by foreigners parking their cash.


In the last housing bubble rents detached from housing prices pretty severely. Back in the 80s, it was generally cheaper to buy than rent - the 20% down saw to that. But things completely flipped with the more recent financial ingenuity.

In a way, I see rising rents in SF as a return to normalcy.


We don't expect anyone to pay for Scrapy or Portia!

We provide the best Platform (as a Service) to run Scrapy or Portia spiders, and will soon be supporting most standard web scraping technologies. This is free for light users, but we charge for people who need extra or dedicated computing or network resources.

We also provide help to startups or enterprise orgs looking to get help in building a web data harvesting system (more than just parsing pages!), either by building it ourselves or by helping our partners train their engineers in using our technologies.

This has worked so far, and we're very healthy from a revenue perspective – more than doubling every year for a few years now, and good enough to grow to become the largest fully distributed company outside of the US.

We're pretty happy with being a brand that gives to the community, it tends to get repaid 10x in the long run.


how much are you generating in terms of revenue? are you venture funded?


I agree, this is definitely a solved problem!

If you need to build a solid web scraping stack which is going to be maintained by many people and is critical to your business, you have two options… to use Scrapy or to build something yourself.

Scrapy has been tried and tested over 6-7 years of community development, as well as being the base infrastructure for a number of >$1B businesses. Not only this, but there is a suite of tools which you have been built around it – Portia for one, but also other lots of useful open source libraries: http://scrapinghub.com/opensource/).

Right now most people still have the issue of having to use xpath or css selectors to run your crawl or get the data, but not for too long.

There's more and more ways of skipping this step and getting at data automatically: https://github.com/redapple/parslepy/wiki/Use-parslepy-with-... https://speakerdeck.com/amontalenti/web-crawling-and-metadat... https://github.com/scrapy/loginform https://github.com/TeamHG-Memex/Formasaurus https://github.com/scrapy/scrapely https://github.com/scrapinghub/webpager https://moz.com/devblog/benchmarking-python-content-extracti...

Scrapy (and also lots of python tools, likely a majority of them created by people using it and BeautifulSoup) have lowered the cost of building web data harvesting systems to the point where one guy can build crawlers for an entire industry in a couple of months.


Should be a fairly easy thing to do, this is Python and Scrapy!

http://stackoverflow.com/questions/28127396/creating-rss-wit...


Hi! This is definitely not legal advice, so consult with a lawyer and do your own research if you are thinking of applying this to your own practices.

I work for Scrapinghub as well and try to understand the law around this. I can help with some pointers to why I think some web scraping isn't illegal… there are of courses some limits to this.

When the data scraped is "is publicly available on the Internet, without requiring any login, password, or other individualized grant of access", the Eastern District Court of Virginia in Cvent vs Eventbrite (https://casetext.com/case/cvent-inc-v-eventbrite) ruled one could not be deemed to be exceeding unauthorized access.

There are two ways, that I know of, that courts have ruled you can exceed your authorization:

- When the site owner has contacted you and removed your authorization in a written manner, as happened on Craigslist vs 3taps.

- By accepting the terms of service and agreeing against scraping. You have to do this through a "clickwrap" ToS, rather than a "browsewrap". You can read about the differences here: https://termsfeed.com/blog/browsewrap-clickwrap/

As a matter of policy, we don't scrape any site with a ToS with clear anti-scraping language and which forces us to create an account or "constructively agree" as part of the use of the site.

Any user wishing to revoke authorization for anyone using our platform can make an abuse report on our site– we tend to handle these within 24 hours and haven't had a single claim go further than this stage, as we aim to be reasonable and look for a way to provide value to both sides.


Thanks for the response. It is good to know that you're cognizant of these issues, at least.

Most websites have anti-scraping boilerplate in their ToU. I'm pretty sure that's in the "customized" ToU I got from LegalZoom. So you're basically saying that if the data is not behind a login and doesn't require you to fill any forms that contain either a checkbox or nearby language that indicates submitting the form constitutes acceptance of the ToU, you'll scrape it, even if the ToU explicitly bans scraping.

What do you plan to do when you do receive a C&D from someone that claims you've agreed to their browsewrap ToU? Are you going to argue that your use is not unauthorized since the data is public?

I guess I assume you'll comply with any C&D since you state that receipt of a written removal constitutes a revocation of authorization. However, I don't believe that's enough for some. Check out QVC v. Resultly, where QVC sued on a CFAA claim based on browse-wrap agreements (they lost; Resultly asserted permission was granted by robots.txt and the Court agreed).

Beyond just CFAA claims, there are copyright and trademark claims attached to most scraping cases. Those have unfortunately succeeded most of the time. The most egregious is Ticketmaster v RMG, where it was ruled that RMG had violated Ticketmaster's copyright by downloading the page (specifically, making a copy of the page contents in RAM and extracting only non-copyrightable content, and then discarding the complete copy; in short, downloading the page). Facebook v. Power Ventures is also pretty brutal.

I hope Scrapinghub is well-funded enough to trailblaze some space in the law here, because we really need it.


everyone of those cases involve multi-people organizations with significant revenues or funding. These cases are largely reflecting of businesses forcefully shutting down other innovators by claiming some bullshit like CFAA. CFAA should really only apply to people who are doing SQL injections and other penetration. Vast majority of people scraping data do not fall under this category of malevolent behavior, although the dumbassery of people who wants to scrape linkedin for 30 bucks on freelancer ruin it for everyone.

I'm not sure Scrapinghub is funded externally as I couldn't find anything on their valuation or revenues so I assume that they are bootsrapped.

I do not discount any of what you wrote but a lot of it are imagined dangers, scrapinghub would be immune to such cases unless they sided with their customers like 3taps did. 3taps did not stop scraping for their client Padmapper. I don't think scrapinghub is willing to put their neck out for someone paying $20/month to scrape craigslist. In fact, those are the shittiest segments of this market imo, the bottom feeders who demand excessively unrealistic expectations from scraping as a blackbox magical world that will solve their problems.


I received a C&D from a Fortune 100 asserting that I was accessing their site in violation of the CFAA, among numerous other silly claims. I'm a 1 person company. I did have some revenues, but they were about even with my full-time job (this was a side project); certainly not enough to satisfy the retainers that lawyers wanted before they'd even think about taking me on. I eventually found a lawyer who agreed to help a little bit for a $2k retainer, but as you'd expect for that rate, I can't get much out of him.

I wasn't doing anything egregious. The product I offered did not compete with their products; it actually made it easier for the consumer to spend money with them. The data I was gathering is mere factual data and is not subject to copyright (though, as in Ticketmaster v. RMG, this alone will not protect from copyright infringement claims). Their site is the single place that this factual data can be found.

Their Terms of Use forbids access by either manual or automated processes; thus, it makes it illegal for anyone to use their site at all, and precludes any solution based on MTurk or similar. It also forbids any access for "commercial use". Combined, this means they can sue you and make you stop using their site basically whenever they want for any reason. They could've done this anyway because the CFAA protects them from any "unauthorized" access.

If I were to actually dispute this company's claims and refuse to comply with their C&D, they would sue me. This would've cost me millions of dollars in legal fees before the case was through, which is irrelevant to them but obviously well outside of my reach. There's a good chance they would've gotten an injunction legally forbidding me from continuing to offer my service almost immediately, so then I'd have been stopped from offering my product AND I would've had a pending lawsuit against me, which would've asserted some absurd dollar amount of damage, and, if Facebook v Power Ventures is any indication, there would've been a good chance that I would've been held personally liable for it.

It doesn't matter that their claims are all dependent on interpretation and grey area. What matters is that if you don't have $30-$40 million dollars sitting around, you can't take the risk of a lawsuit from a big company. Gotta earmark $1-10 million for legal fees (depending on what kind of lawyers you get; the opposing party in my case has one of the most expensive law firms in the country); set aside $5-10 million in case you lose and have to pay damages, set aside some chunk of money to continue to bear the cost of maintaining and running the business despite the legal pressure and despite the likelihood that you've been legally disallowed from selling your primary money maker pending resolution of the case, which will likely drag on for a minimum of 3-5 years, and up to 10 years is not really unheard of. Gotta have the extra $20 mil+ so that you don't pour more than 50% of your net worth into something that is very possibly a losing battle.

My lawyer advises me that the various workarounds I devised could be construed as conspiracy and aiding and abetting, even though I would no longer be making any requests to the complainant's servers at all. This also wouldn't stop the complainant from suing me for past damages or to stop the practice they dislike, even if I'm doing it through means that totally obviate the need to access any of their servers.

If I were to continue operating, the only option would be to leave the U.S. entirely for a jurisdiction that doesn't enforce U.S. judgments (since I would be sued in the US and lose by default; my lawyer indicates that merely moving my company overseas is insufficient), and not return until the statue of limitations expires on the judgment that would get registered. Even this is not foolproof because the activity would have to be obviously and unequivocally legal in the new host jurisdiction so that the company's lawsuit in that jurisdiction wouldn't get anywhere, the jurisdiction would have to decline to enforce judgments on U.S. persons, and they'd have to be impervious to attempts by one of the world's largest companies to influence their legal system. I haven't found such a jurisdiction yet. Some are kinda-sorta close (but not really).


It's not clear enough what you were doing before that led them to a C&D. Were you doing what Scrapinghub was doing? A web scraping tool vendor and service provider? It sounds like you were doing something shady enough for them to not even email you but C&D you directly.

There's a distinct line between what you do with the data you scrape vs. writing the tools and code to build a script that will get you that data.

You don't arrest the kitchen knife company's CEO because someone used it to stab someone. And the fact that the web scraping/crawler vendors have saturated the market is testament to the fact that you are overreacting.

I'd imagine the fear of facing a devastating legal battle and how it might have permanently shifted your view on web scraping but I see no valid basis for all web scraping services and vendors to shut down.

I also find it puzzling you would be acting against your interests to continue to openly talk in details about a legal situation like this because that padmapper guy pretty much just shut up as soon as the details were involved.

but feel free to provide more details that shows that you were running a web scraping service or software company.


We weren't doing anything remotely nefarious with the non-copyrightable data we gathered. Some details are intentionally unclear. You can continue to make your own inferences on these.

Big companies don't send polite emails asking you to pretty please stop. They just let their lawyers deal with the whole kit and kaboodle.

It is illegal, or close enough to illegal, to scrape from practically any company in the U.S., because "unauthorized access" is a floating definition; as soon as that company makes a decision that they don't want you doing that thing you do anymore, you're doing something illegal; their change of heart can make your previously fine action a crime. The Terms of Use for almost all companies state as much. The statute does not state any required notice period or method, so you'd have to argue to the relevant magistrate that you didn't have reasonable knowledge that your scrape was unauthorized. This is the crux on which all scraping cases have hung, and the results are usually not favorable at all to the scrapers, although 1 or 2 recent decisions are sort of hopeful. Also note that this is only the CFAA portion; these suits usually allege a bunch of other torts too, which have proven similarly difficult to beat.

Scrapinghub's existence depends essentially on luck; first, that they won't get sued, and second, if they do get sued, that they'll get a sympathetic judge who will find that no contract was entered due to insufficient notice. That is not likely due to the nature of scrapinghub's operations (see Register.com v. Verio). The fact that some people are able to scrape and get away without being sued doesn't change the legal reality or the dubiousness of investing in a company with such a large risk profile.

The knife CEO analogy fails because CFAA claims are NOT about how the data is used. They are about the method used to obtain the data. The entity exceeding authorized computer access or accessing a computer without authorization -- in this case, that is scrapinghub, kimono, et al -- is the entity that has committed the violation of the CFAA. In your knife analogy, if the knife company had illegally acquired the metals used to manufacture the knife, it would be the culpable party, not the end user that bought its knives. The data that scrapinghub goes out, obtains, crafts and packages according to customer specifications ("make this page on craigslist a CSV file that auto-updates every 5 minutes") is the metal that the knife company goes out, obtains, crafts and packages according to customer specifications ("make this metal a sharp cutting utensil").

The person using the data that results from CFAA violations may be doing other illegal things, but in almost all of these types of cases, they're not violating the CFAA if they're not the ones accessing the computer that supplies the data.

I'm really not sure what you're arguing about anymore. The CFAA isn't a real law because the person gathering the data isn't necessarily the one putting it to use? I don't understand.


okay I think you are honestly trolling now. nicely played and good bye.


Scrapinghub Ltd. is looking for PYTHON (Scrapy, Django) and ERLANG Engineers, as well as SALES and SUPPORT engineers.

We're a fully distributed company (largest founded outside of the US!) with 107 engineers and staff. So totally REMOTE.

Based around open source, we maintain Scrapy, Portia, Webstruct, Frontera, and a lots of other tools made for crawling and scraping massive web datasets– everyone at SH helps makes these projects grow, and we offer to pay you to work on open source if you're good enough.

http://scrapinghub.com/careers

You'll have the chance to work on projects that harvest and transfer datasets of thousands of millions of records, as well as build some of the systems that will deliver data to current Fortune 500 companies and the startups that are building great products on top of our stack.

We have a very engineering-driven culture (two engineer-founders) and a great place to work if you're self-directed and curious, and interested in working in open source environments.


`This is a telecommuting position and salaries we pay are not adjusted based on where you live.`

That's great. In other words, can one expect US-level salaries regardless of place of residence? How common is 6-figure USD salaries among your employees around the world?


I work for Scrapinghub which has a Proxy API which may work for you: http://scrapinghub.com/crawlera

Pricing is a lot less and we only use our own IPs.


Understood immediately when I saw it, thanks to that!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: