Here's my issue with this kind of "magic" API design:
It's not clear when or how serialization happens.
Supposedly at some point the message I am sending to the worker is serialized, but this is not made clear to the user and it's not made clear when this happens.
A well-designed library would synchronously serialize the given object the moment it is passed in or let the user explicitly handle serialization. But I don't think that's what is happening here.
It appears messages are serialized only eventually and when they are finally sent off to a worker.
If you accidentally pass mutable state in here, you're in for a really confusing and fun debugging session. Likely it'll be a production-only bug too, because during development and testing you're unlikely to have the kind of message volume required to run into some modified-before-sent condition.
CTRL-F "mutable" and CTRL-F "serialize" gives no results, so I don't think the designers thought of this or thought to warn users.
I definitely ran into some serialization issues when trying Piscina a while ago. Returning generated images from the workers would keep getting slower and slower until the overhead of data exchange was 20x the time it took to generate it.
I discovered through trial and error that returning image bytes directly would incur this ever increasing overhead, btu creating a binary buffer and returning that would not. It was just a pet weekend project, so I never did discover what was causing the issue.
What would be nice is a way of preventing the same job running multiple times concurrently. Like if I start a job and a job with the same parameters was already started milliseconds ago then it automatically awaits the already running job rather than starting another.
You could build a reasonably lightweight supervisor pattern that uses a parameter-derived hash for comparison to handle this kind of situation in your application too.
Might be easier and more flexible than asking the library to do it?
That's exactly what I do in python with celery. I order all dicts/lists and hash this final object. If hash is not found in the db, I set hash as the task id. With `track started` setting I never run the same task twice.
Interesting that it's written in typescript and the readme doesn't mention using it with typescript. I ran into annoyances with this recently trying out nodes worker threads in a typescript project. I was running with ts-node, but the worker thread didn't know how to load a typescript file. There are some workarounds but they're not elegant.
I use workers in deno to evaluate user facing code which may take a lot of time to finish and need sandboxing. Another place it's used in where I want to keep context and reuse it for subsequent code execution for each user.
In deno, you can restrict what the code inside worker can do by passing a map of allowed permissions. I have built a simple privilege system on top of this to allow users different access level.
This is cheaper and faster than spinning up container.
I maintain a command-line utility which assembles an eBook from scraped .xhtml files. It uses a similar package, workerpool, to process multiple chapters in parallel.
The worker API in Javascript is quite a pain to use, but needless to say multithreading is invaluable in many contexts, both in the browser and in Node. I haven't used this library but it seems to solve a similar problem to other ones in the same space — make writing multithreaded code sane, allowing you to avoid writing a bunch of repetitive boilerplate.
Knex, the SQL query builder, uses Tarn.js[1] for connection pooling to the db.
I've been using Tarn a bunch at work recently. We're doing some batch jobs, and I'm queuing work at each stage in Tarn.js pools. I created my own enqueue function that waits until the pool is less than a high-water mark in size before enqueueing. Then the pool has however many workers running.
Neither of these are off-thread pools. But they help a lot of for managing multiple async operations.
We have an api used for generating reports from mongo dB to csv, this let's the report process in the background leaving the report api to still handle requests.
My "htop" display surely disagrees. So do my node programs that lag out waiting for disk IO thread pool slots. Sounds does node's documentation: https://nodejs.org/api/worker_threads.html
You needn't use your real name, of course, but for HN to be a community, users need some identity for other users to relate to. Otherwise we may as well have no usernames and no community, and that would be a different kind of forum. https://hn.algolia.com/?query=community%20identity%20by:dang...
To quote Wikipedia, "Romance languages (less commonly Latin languages, or Neo-Latin languages) are the modern languages that evolved from Vulgar Latin between the third and eighth centuries."
"Piscina" is also "pool" in Romanian, which is, you guessed, a Romance language.
Maybe it is just an English thing, but Romance languages are anything evolved from latin (predumably spread by the Roman empire, hence "romance"). Portuguese is included in the list, fwiw.
It's not clear when or how serialization happens.
Supposedly at some point the message I am sending to the worker is serialized, but this is not made clear to the user and it's not made clear when this happens.
A well-designed library would synchronously serialize the given object the moment it is passed in or let the user explicitly handle serialization. But I don't think that's what is happening here.
It appears messages are serialized only eventually and when they are finally sent off to a worker.
If you accidentally pass mutable state in here, you're in for a really confusing and fun debugging session. Likely it'll be a production-only bug too, because during development and testing you're unlikely to have the kind of message volume required to run into some modified-before-sent condition.
CTRL-F "mutable" and CTRL-F "serialize" gives no results, so I don't think the designers thought of this or thought to warn users.