Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And still one thing .. if you want to ask the site owner for permissions, why not ask them to produce some specific xml file for you?


Because granting permission is easy. Why would they go to more effort than that for random people?


Why thousands of job sites produce custom xml output for simplyhired or indeed ??


They like buzzwords?


I think this is the best way to go. There's no reason that you should be scraping HTML from sites, when there might be a nice xml feed available. For instance, pricegrabber will only index your site if they have prices in XML.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: