Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The cost of the browser part is still a problem. In our previous startup, we were scraping >20 millions of webpages per day, with thousands of instances of Chrome headless in parallel.

Regarding the RAM usage, it's still ~10x better than Chrome :) It seems to be coming mostly from v8, I guess that we could do better with a lightweight JS engine alternative.



As a web developer and server manager AI trainers scraping websites with no throttle is the problem. lol


> there are hundreds of Web APIs, and for now we just support some of them (DOM, XHR, Fetch)

> it's still ~10x better than Chrome

Do you expect it to stay that way once you've reached parity?


I don't expect it to change a lot. All the main components are there, it's mainly a question of coverage now.


Playwright can run webkit very easily and it's dramatically less resource-intensive than Chrome.


Yes but WebKit is not a browser per se, it's a rendering engine.

It's less resource-intensive than Chrome, but here we are talking orders of magnitude between Lightpanda and Chrome. If you are ~10x faster while using ~10x less RAM you are using ~100x less resources.


How well does it compare to specialized headless scraper browsers, like camoufox (firefox based) or secret agent (chrome based)?

Either should reduce your ram usage compared to stock chrome by a lot.


Careful, as you implement misssing features your RAM usage might grow too. Happened to many projects, lean at the beggining, get's just as slow when dealing with real world mesiness.


Does it work nicely on Linux? I'm very curious about this


How about using QuickJS instead of full-blown V8? For example, Elinks has support for SpiderMonkey, QuickJS, MuJS: https://github.com/rkd77/elinks/blob/master/doc/ecmascript.t... and takes a few MB of RAM.


You may reduce ram, but also performance. A good JIT costs ram.


Yes, that's true. It's a balance to find between RAM and speed.

I was thinking more on use cases that require to disable JIT anyway (WASM, iOS integration, security).


Yeah, could be nice to allow the user to select the type of ECMAScript engine that fits their use-case / performance requirements (balancing the resources available).


If your target is consistent enough (perhaps even stationary), then at some point "JIT" means wasting CPU cycles.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: