Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
ivan
on Jan 8, 2008
|
parent
|
context
|
favorite
| on:
Ask YC: Any ideas about intelligent crawlers :)
And still one thing .. if you want to ask the site owner for permissions, why not ask them to produce some specific xml file for you?
akkartik
on Jan 8, 2008
|
next
[–]
Because granting permission is easy. Why would they go to more effort than that for random people?
ivan
on Jan 8, 2008
|
parent
|
next
[–]
Why thousands of job sites produce custom xml output for simplyhired or indeed ??
imsteve
on Jan 9, 2008
|
root
|
parent
|
next
[–]
They like buzzwords?
marketer
on Jan 8, 2008
|
prev
[–]
I think this is the best way to go. There's no reason that you should be scraping HTML from sites, when there might be a nice xml feed available. For instance, pricegrabber will only index your site if they have prices in XML.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: