Piscina – The Node.js Worker Pool

chmod775 · on March 24, 2021

Here's my issue with this kind of "magic" API design:

It's not clear when or how serialization happens.

Supposedly at some point the message I am sending to the worker is serialized, but this is not made clear to the user and it's not made clear when this happens.

A well-designed library would synchronously serialize the given object the moment it is passed in or let the user explicitly handle serialization. But I don't think that's what is happening here.

It appears messages are serialized only eventually and when they are finally sent off to a worker.

If you accidentally pass mutable state in here, you're in for a really confusing and fun debugging session. Likely it'll be a production-only bug too, because during development and testing you're unlikely to have the kind of message volume required to run into some modified-before-sent condition.

CTRL-F "mutable" and CTRL-F "serialize" gives no results, so I don't think the designers thought of this or thought to warn users.

nimbix · on March 24, 2021

I definitely ran into some serialization issues when trying Piscina a while ago. Returning generated images from the workers would keep getting slower and slower until the overhead of data exchange was 20x the time it took to generate it.

I discovered through trial and error that returning image bytes directly would incur this ever increasing overhead, btu creating a binary buffer and returning that would not. It was just a pet weekend project, so I never did discover what was causing the issue.

cjdell · on March 23, 2021

What would be nice is a way of preventing the same job running multiple times concurrently. Like if I start a job and a job with the same parameters was already started milliseconds ago then it automatically awaits the already running job rather than starting another.

fendy3002 · on March 23, 2021

If you can use redis, redlock can be a nice implementation.

Example: https://github.com/fendy3002/NodeLearn/blob/master/Redlock/s...

sbarre · on March 23, 2021

You could build a reasonably lightweight supervisor pattern that uses a parameter-derived hash for comparison to handle this kind of situation in your application too.

Might be easier and more flexible than asking the library to do it?

tomwojcik · on March 23, 2021

That's exactly what I do in python with celery. I order all dicts/lists and hash this final object. If hash is not found in the db, I set hash as the task id. With `track started` setting I never run the same task twice.

tekno45 · on March 23, 2021

Add unique job id?

lyjackal · on March 24, 2021

Interesting that it's written in typescript and the readme doesn't mention using it with typescript. I ran into annoyances with this recently trying out nodes worker threads in a typescript project. I was running with ts-node, but the worker thread didn't know how to load a typescript file. There are some workarounds but they're not elegant.

freeqaz · on March 23, 2021

Does anybody use this or anything similar? If so, what problems are you solving?

monstermachine · on March 23, 2021

I use workers in deno to evaluate user facing code which may take a lot of time to finish and need sandboxing. Another place it's used in where I want to keep context and reuse it for subsequent code execution for each user.

In deno, you can restrict what the code inside worker can do by passing a map of allowed permissions. I have built a simple privilege system on top of this to allow users different access level.

This is cheaper and faster than spinning up container.

domenicd · on March 23, 2021

I maintain a command-line utility which assembles an eBook from scraped .xhtml files. It uses a similar package, workerpool, to process multiple chapters in parallel.

https://github.com/domenic/worm-scraper/blob/master/lib/conv...

https://www.npmjs.com/package/workerpool

Etheryte · on March 23, 2021

The worker API in Javascript is quite a pain to use, but needless to say multithreading is invaluable in many contexts, both in the browser and in Node. I haven't used this library but it seems to solve a similar problem to other ones in the same space — make writing multithreaded code sane, allowing you to avoid writing a bunch of repetitive boilerplate.

rektide · on March 23, 2021

Knex, the SQL query builder, uses Tarn.js[1] for connection pooling to the db.

I've been using Tarn a bunch at work recently. We're doing some batch jobs, and I'm queuing work at each stage in Tarn.js pools. I created my own enqueue function that waits until the pool is less than a high-water mark in size before enqueueing. Then the pool has however many workers running.

Neither of these are off-thread pools. But they help a lot of for managing multiple async operations.

[1] https://github.com/vincit/tarn.js/

curben · on March 23, 2021

AFAIK this is a wrapper of worker_threads API. https://github.com/tuananh/camaro utilises this for multi-threading processing of XML input. Hexo (static site generator) is attempting to utilise this library (https://github.com/hexojs/hexo/issues/4355).

vorticalbox · on March 23, 2021

At work I use https://github.com/pioardi/poolifier

We have an api used for generating reports from mongo dB to csv, this let's the report process in the background leaving the report api to still handle requests.

_5vzs · on March 23, 2021

Yeah. I built Bree @ https://jobscheduler.net

It has support for worker threads + cron + human strings

GH: https://github.com/breejs/bree

ddoolin · on March 23, 2021

This is cool. Our use case was very simple so we went with node-schedule but this may have more upsides. I've had trouble with the former. Thanks.

_5vzs · on March 23, 2021

I helped maintain all the Node.js node/cron/agenda libraries. Just switch to Bree and your troubles will be gone.

ddoolin · on March 23, 2021

We use worker threads directly to process large unorganized (for the browser) datasets and do some deductions before it hits the store.

We also have a worker thread blocked on a redis channel that acts as a queue.

Skhalar · on March 23, 2021

This is useful for processing large chunks of data like audio files (look at Superpowered sdk) but breaking em down or when processing multiple files.

barefeg · on March 23, 2021

I’m guessing the fact that node is single threaded

nobleach · on March 23, 2021

JavaScript is single threaded, Node is most certainly not.

gbrits · on March 23, 2021

Yes it surely is

zbentley · on March 23, 2021

My "htop" display surely disagrees. So do my node programs that lag out waiting for disk IO thread pool slots. Sounds does node's documentation: https://nodejs.org/api/worker_threads.html

hmcdona1 · on March 23, 2021

It's not exactly. Certain operations like I/O are threaded in Node. libuv has it's own threadpool that it uses for a lot of these types of tasks.

jayflux · on March 24, 2021

Node uses a thread pool for tasks like reading files async and crypto operations etc, you can read about it here https://nodejs.org/en/docs/guides/dont-block-the-event-loop/

maxrev17 · on March 23, 2021

Might be quite nice for keeping tokens refreshed?

ericlewis · on March 23, 2021

Neat! I’m friends with the creator of this and teased him a bit about the name (so many of these projects have weird names now a day)

The reason is: don’t wanna be boring from what I could glean.

kevinstubbs · on March 24, 2021

"piscina" in Italian means "pool". A library for worker pools simply named "pool" in Italian doesn't seem that strange :)

hakcermani · on March 24, 2021

Piscine Molitor Patel in the Life of Pi was named after the pool in Paris !

timmit · on March 23, 2021

u remind me my own researching project too.

https://github.com/tim-hub/pambdajs

revskill · on March 23, 2021

Could i use it on a serverless platform ?

timmit · on March 23, 2021

I assume u can, but how it performs, it really dependents on the virtual CPU of the, platform.

I did something similar

https://github.com/tim-hub/pambdajs

but I haven't done the comparison on aws lambda yet

29athrowaway · on March 23, 2021

Common mistake in stream code:

    -- .on('end')
    ++ .once('end')

hfktk4nrn · on March 23, 2021

Is it just me, or do I see a trend in naming projects using romance language words (Italian/Spanish/France)?

Does it sound more exotic? Are these words less crowded?

dang · on March 24, 2021

Could you please stop creating accounts for every few comments you post? We ban accounts that do that. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.

You needn't use your real name, of course, but for HN to be a community, users need some identity for other users to relate to. Otherwise we may as well have no usernames and no community, and that would be a different kind of forum. https://hn.algolia.com/?query=community%20identity%20by:dang...

G4BB3R · on March 23, 2021

I think it's because most of english names are already taken.

andreynering · on March 23, 2021

I'm curious on why you call these languages "romance" languages.

"Piscina" is a Portuguese word for "pool", by the way.

mayank · on March 23, 2021

> I'm curious on why you call these languages "romance" languages.

That's what they're called: https://en.wikipedia.org/wiki/Romance_languages

rodrigodiez · on March 23, 2021

Romance languages are those derived from vulgar latin. They are called romance because Rome (Roma)

Portuguese is also a romance language, as Spanish or Italian. Piscina also means pool in Spanish by the way :)

https://simple.wikipedia.org/wiki/Romance_languages

age008 · on March 23, 2021

To quote Wikipedia, "Romance languages (less commonly Latin languages, or Neo-Latin languages) are the modern languages that evolved from Vulgar Latin between the third and eighth centuries." "Piscina" is also "pool" in Romanian, which is, you guessed, a Romance language.

andreynering · on March 23, 2021

Got it.

In Portuguese they're only known for "Latin languages", which explains my question. ;-)

Kaze404 · on March 23, 2021

Not really. I've personally heard the term before. https://pt.wikipedia.org/wiki/L%C3%ADnguas_rom%C3%A2nicas

gdsimoes · on March 23, 2021

I haven’t. Maybe it’s not a popular term in Brazil.

zdragnar · on March 23, 2021

Maybe it is just an English thing, but Romance languages are anything evolved from latin (predumably spread by the Roman empire, hence "romance"). Portuguese is included in the list, fwiw.

https://en.m.wikipedia.org/wiki/Romance_languages

olakease · on March 23, 2021

https://en.wikipedia.org/wiki/Romance_languages

th3h4mm3r · on March 23, 2021

Also in Italy, we say "piscina" in the same way.