Is there an open source crawler/library that will recursively follow only links under a certain xpath and ignore the rest?
I don't want to do an exhaustive crawl of every single link, I want something that will only follow links under a main content area.
From their site:
Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.