Ask HN: Open source focused crawler?

sheraz · on Feb 9, 2014

I highly recommend Scrapy (http://www.scrapy.org).

From their site:

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

techaddict009 · on Feb 9, 2014

Check this out : http://commoncrawl.org/

Its not exactly what you are looking for but might help you.

forkrulassail · on Feb 9, 2014

Have you tried BeautifulSoup?