Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Piscina – The Node.js Worker Pool (github.com/piscinajs)
75 points by AquiGorka on March 23, 2021 | hide | past | favorite | 47 comments


Here's my issue with this kind of "magic" API design:

It's not clear when or how serialization happens.

Supposedly at some point the message I am sending to the worker is serialized, but this is not made clear to the user and it's not made clear when this happens.

A well-designed library would synchronously serialize the given object the moment it is passed in or let the user explicitly handle serialization. But I don't think that's what is happening here.

It appears messages are serialized only eventually and when they are finally sent off to a worker.

If you accidentally pass mutable state in here, you're in for a really confusing and fun debugging session. Likely it'll be a production-only bug too, because during development and testing you're unlikely to have the kind of message volume required to run into some modified-before-sent condition.

CTRL-F "mutable" and CTRL-F "serialize" gives no results, so I don't think the designers thought of this or thought to warn users.


I definitely ran into some serialization issues when trying Piscina a while ago. Returning generated images from the workers would keep getting slower and slower until the overhead of data exchange was 20x the time it took to generate it.

I discovered through trial and error that returning image bytes directly would incur this ever increasing overhead, btu creating a binary buffer and returning that would not. It was just a pet weekend project, so I never did discover what was causing the issue.


What would be nice is a way of preventing the same job running multiple times concurrently. Like if I start a job and a job with the same parameters was already started milliseconds ago then it automatically awaits the already running job rather than starting another.


If you can use redis, redlock can be a nice implementation.

Example: https://github.com/fendy3002/NodeLearn/blob/master/Redlock/s...


You could build a reasonably lightweight supervisor pattern that uses a parameter-derived hash for comparison to handle this kind of situation in your application too.

Might be easier and more flexible than asking the library to do it?


That's exactly what I do in python with celery. I order all dicts/lists and hash this final object. If hash is not found in the db, I set hash as the task id. With `track started` setting I never run the same task twice.


Add unique job id?


Interesting that it's written in typescript and the readme doesn't mention using it with typescript. I ran into annoyances with this recently trying out nodes worker threads in a typescript project. I was running with ts-node, but the worker thread didn't know how to load a typescript file. There are some workarounds but they're not elegant.


Does anybody use this or anything similar? If so, what problems are you solving?


I use workers in deno to evaluate user facing code which may take a lot of time to finish and need sandboxing. Another place it's used in where I want to keep context and reuse it for subsequent code execution for each user.

In deno, you can restrict what the code inside worker can do by passing a map of allowed permissions. I have built a simple privilege system on top of this to allow users different access level.

This is cheaper and faster than spinning up container.


I maintain a command-line utility which assembles an eBook from scraped .xhtml files. It uses a similar package, workerpool, to process multiple chapters in parallel.

https://github.com/domenic/worm-scraper/blob/master/lib/conv...

https://www.npmjs.com/package/workerpool


The worker API in Javascript is quite a pain to use, but needless to say multithreading is invaluable in many contexts, both in the browser and in Node. I haven't used this library but it seems to solve a similar problem to other ones in the same space — make writing multithreaded code sane, allowing you to avoid writing a bunch of repetitive boilerplate.


Knex, the SQL query builder, uses Tarn.js[1] for connection pooling to the db.

I've been using Tarn a bunch at work recently. We're doing some batch jobs, and I'm queuing work at each stage in Tarn.js pools. I created my own enqueue function that waits until the pool is less than a high-water mark in size before enqueueing. Then the pool has however many workers running.

Neither of these are off-thread pools. But they help a lot of for managing multiple async operations.

[1] https://github.com/vincit/tarn.js/


AFAIK this is a wrapper of worker_threads API. https://github.com/tuananh/camaro utilises this for multi-threading processing of XML input. Hexo (static site generator) is attempting to utilise this library (https://github.com/hexojs/hexo/issues/4355).


At work I use https://github.com/pioardi/poolifier

We have an api used for generating reports from mongo dB to csv, this let's the report process in the background leaving the report api to still handle requests.


Yeah. I built Bree @ https://jobscheduler.net

It has support for worker threads + cron + human strings

GH: https://github.com/breejs/bree


This is cool. Our use case was very simple so we went with node-schedule but this may have more upsides. I've had trouble with the former. Thanks.


I helped maintain all the Node.js node/cron/agenda libraries. Just switch to Bree and your troubles will be gone.


We use worker threads directly to process large unorganized (for the browser) datasets and do some deductions before it hits the store.

We also have a worker thread blocked on a redis channel that acts as a queue.


This is useful for processing large chunks of data like audio files (look at Superpowered sdk) but breaking em down or when processing multiple files.


I’m guessing the fact that node is single threaded


JavaScript is single threaded, Node is most certainly not.


Yes it surely is


My "htop" display surely disagrees. So do my node programs that lag out waiting for disk IO thread pool slots. Sounds does node's documentation: https://nodejs.org/api/worker_threads.html


It's not exactly. Certain operations like I/O are threaded in Node. libuv has it's own threadpool that it uses for a lot of these types of tasks.


Node uses a thread pool for tasks like reading files async and crypto operations etc, you can read about it here https://nodejs.org/en/docs/guides/dont-block-the-event-loop/


Might be quite nice for keeping tokens refreshed?


Neat! I’m friends with the creator of this and teased him a bit about the name (so many of these projects have weird names now a day)

The reason is: don’t wanna be boring from what I could glean.


"piscina" in Italian means "pool". A library for worker pools simply named "pool" in Italian doesn't seem that strange :)


Piscine Molitor Patel in the Life of Pi was named after the pool in Paris !


u remind me my own researching project too.

https://github.com/tim-hub/pambdajs


Could i use it on a serverless platform ?


I assume u can, but how it performs, it really dependents on the virtual CPU of the, platform.

I did something similar

https://github.com/tim-hub/pambdajs

but I haven't done the comparison on aws lambda yet


Common mistake in stream code:

    -- .on('end')
    ++ .once('end')


Is it just me, or do I see a trend in naming projects using romance language words (Italian/Spanish/France)?

Does it sound more exotic? Are these words less crowded?


Could you please stop creating accounts for every few comments you post? We ban accounts that do that. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.

You needn't use your real name, of course, but for HN to be a community, users need some identity for other users to relate to. Otherwise we may as well have no usernames and no community, and that would be a different kind of forum. https://hn.algolia.com/?query=community%20identity%20by:dang...


I think it's because most of english names are already taken.


I'm curious on why you call these languages "romance" languages.

"Piscina" is a Portuguese word for "pool", by the way.


> I'm curious on why you call these languages "romance" languages.

That's what they're called: https://en.wikipedia.org/wiki/Romance_languages


Romance languages are those derived from vulgar latin. They are called romance because Rome (Roma)

Portuguese is also a romance language, as Spanish or Italian. Piscina also means pool in Spanish by the way :)

https://simple.wikipedia.org/wiki/Romance_languages


To quote Wikipedia, "Romance languages (less commonly Latin languages, or Neo-Latin languages) are the modern languages that evolved from Vulgar Latin between the third and eighth centuries." "Piscina" is also "pool" in Romanian, which is, you guessed, a Romance language.


Got it.

In Portuguese they're only known for "Latin languages", which explains my question. ;-)


Not really. I've personally heard the term before. https://pt.wikipedia.org/wiki/L%C3%ADnguas_rom%C3%A2nicas


I haven’t. Maybe it’s not a popular term in Brazil.


Maybe it is just an English thing, but Romance languages are anything evolved from latin (predumably spread by the Roman empire, hence "romance"). Portuguese is included in the list, fwiw.

https://en.m.wikipedia.org/wiki/Romance_languages



Also in Italy, we say "piscina" in the same way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: