More

berbc · on June 6, 2021

> __attribute__((deprecated)) is evil

I was wondering if anyone had some insight on this?

I understand that deprecating a function usually means you'll be removing it at some point, which goes against the rule "Evolution is addition only", but is it that bad to try to steer users towards a better API, or am I missing something particularly bad about that deprecation attribute?

berbc · on Nov 10, 2020

As said in one of the YouTube comments for the video you linked here, it's not so much feature branches that are the problem, but long-lived feature branches.

If you want to add a complex feature to a project, you could do it via several branches, one after the other. For instance, the first branch could introduce a feature flag and the second branch implement part of the new feature, and so on.

cjfd · on Nov 10, 2020

This is more of a semantic quibble than anything else. Yes, long-lived branches are not just a problem but absolutely terrible. I have seen one that lived for some 4 years or so. They are a problem to the extend that they live long. The shorter they live the less of a problem they are. So, take this to its conclusion and conclude that the best feature branch is one that is only one small commit long. And now the feature branch is completely redundant....

berbc · on June 12, 2020

Is speed really a good reason for using async? If I remember correctly, asynchronous I/O was introduced to deal with many concurrent clients.

Therefore, I would have liked to see how much memory all those workers use, and how many concurrent connections they can handle.

jillesvangurp · on June 12, 2020

I think speed is the wrong word here. A better word is throughput.

The underlying issue with python is that it does not support threading well (due to the global interpreter lock) and mostly handles concurrency by forking processes instead. The traditional way of improving throughput is having more processes, which is expensive (e.g. you need more memory). This is a common pattern with other languages like ruby, php, etc.

Other languages use green threads / co-routines to implement async behavior and enable a single thread to handle multiple connections. On paper this should work in python as well except it has a few bottlenecks that the article outlines that result in throughput being somewhat worse than multi process & synchronous versions.

MaxBarraclough · on June 12, 2020

I think 'scalability' is the best word here.

Taken from Stephen Cleary's SO answer on this topic: https://stackoverflow.com/a/31192718

throwaway894345 · on June 12, 2020

> which is expensive (e.g. you need more memory)

Memory is cheap; the cost is in constant de/serialization. Same with "just rewrite the hotspots in C!"-style advice; de/serialization can easily eat anything you saved by multiprocessing/rewriting. Python is a deceivingly hard language, and a lot of this is a direct result of the "all of CPython is the public C-extension interface!" design decision (significant limitations on optimizations => heavy dependency on C-extensions for anything remotely performance sensitive => package management has to deal extensively with the nightmare that is C packaging => no meaningful cross-platform artifacts or cross compilation => etc).

ianbutler · on June 12, 2020

Memory is not cheap when dealing the real world cost of deploying a production system. The pre fork worker model used in many sync cases is very resource intensive and depending on the number of workers you're probably paying a lot more for the box it's running on, ofc this is different if you're running on your own metal but I have other issues with that.

throwaway894345 · on June 12, 2020

> Memory is not cheap when dealing the real world cost of deploying a production system.

What? What makes you say that? What did you think I was talking about if not a production system? To be clear, we're talking about the overhead of single-digit additional python interpreters unless I'm misunderstanding something...

ianbutler · on June 12, 2020

Observed costs from companies running the pre fork worker model vs alternative deployment methods and just in the benchmark they're running double digit interpreters which I've seen as more common and expensive.

throwaway894345 · on June 12, 2020

Double-digit interpreters per host? Where is the expense? Interpreters have a relatively small memory overhead (<10mb). If you're running 100 interpreters per host (you shouldn't be), that's an extra $50/host/year. But you should be running <10/host, so an extra $5/host/year. Not ideal, but not "expensive", and if you care about costs your biggest mistake was using Python in the first place.

ianbutler · on June 12, 2020

I don't know where you're seeing the < 10mb from the situation I saw they were easily consuming 30mb per interpreter. Even my cursory search around now shows them at roughly 15-20mb so assuming the 30mb Gunicorn was just misconfigured that's still an extra $100 per host using your estimate and what I'm looking at Googling around and across a situation where there are multiple public apis that's adding up pretty quickly.

Another google search shows me Gunicorn, for instance, using high memory on fork isn't exactly uncommon either.

Edit: I reworded some stuff up there and tried to make my point more clear.

throwaway894345 · on June 12, 2020

The interpreter overhead on macos is 7.7mb. I can't speak to gunicorn configuration but it's far from the only game in town.

ianbutler · on June 12, 2020

Totally fair point, my experience with fork type deploys has only been Gunicorn so I'll take this as a challenge to try some others out.

earthboundkid · on June 13, 2020

Yes, C dependency management is awful, and because Python is only practical with C extensions for performance critical code, it ends up being a nightmare as well.

jordic · on June 12, 2020

In our use case switching to asyncio it's like moving from 12 cores to 3... (And I'm pretty sure we are handling more concurrency... from 24-30 req/s to 150req/s But our workload is mostly network related (db, external services...)

blondin · on June 12, 2020

same.

maybe author is concerned that many people are jumping the gun on async-await before we all fully understand why we need it at all. and that's true. but that paradigm was introduced (borrowed) to solve a completely different issue.

i would love to see how many concurrent connections those sync processes handle.

calpaterson · on June 12, 2020

Hi - not sure what you mean by this. The sync workers handle one request (to completion) per worker. So 16 workers means 16 concurrent requests. For the async workers it's different - they do more concurrently - but as discussed their throughput is not better (and latency much worse).

Maybe what you're getting at is cases where there are a large number of (fairly sleepy) open connections? Eg for push updates and other websockety things. I didn't test that I'm afraid. The state of the art there seems to be using async and I think that's a broadly appropriate usage though that is generally not very performance sensitive code except that you try to do as little as possible in your connection manager code.

fnord123 · on June 12, 2020

In the case of everything working smoothly that model may play out. But if you get a client that times out, or worse, a slow connection then they used one of your workers for a long time in a synchronous model. In the async model this has less of a footprint as you are still accepting other connections despite the slow progress of one of the workers.

blondin · on June 12, 2020

yes many open connections is what i meant (suggested by other people as well). by the way, i really liked the writing, it's refreshing. and i agree with you that people aren't using async for the right reasons.

calpaterson · on June 12, 2020

Thanks :) , really appreciate that. I think all technology goes through a period of wild over-application early on. My country is full of (hand dug) canals for example

berbc · on Feb 14, 2019

As pointed out by some of you, this looks more like permutation of sequences. Also, since a symmetric cipher is used, I'm surprised the author didn't mention Black and Rogaway [1].

Their algorithm is a permutation (int -> int) that works on a domain of any size up to a limit. A typical application for this is encrypting credit card numbers so that the ciphertext still looks like a credit card number (non-trivial because the size of domain isn't a power of two) or efficiently shuffling sequences, randomly in appearance but repeatably if you know the seed.

For instance, this is used by Masscan to randomize the order in which IP addresses and ports are scanned [2]. I've built a Python package that could help you use this algorithm [3] (mostly for fun, but maybe that's useful, let me know :)).

[1]: https://en.wikipedia.org/wiki/Format-preserving_encryption#T... [2]: https://github.com/robertdavidgraham/masscan/blob/6c15edc280... [3]: https://github.com/bbc2/shuffled

berbc · on Nov 19, 2016

If by [1..N ⟶ 1..N] you mean the set of functions from [1..N] that have values in [1..N], I think you are wrong. The formula in the post describes a subset, the set of such functions that are surjective.

j-pb · on Nov 19, 2016

Yeah I think your right, in this context I saw the case of [ A -> B ] as a function from coimage A to image B which are all bijective. But it is meant as the more standard domain A to codomain B.

His natural language description is somewhat lacking in that aspect

> I had already explained that [1..N ⟶ 1..N] is the set of functions that map the set 1..N of integers from 1 through N into itself,

sampo · on Nov 20, 2016

Well, he did say "into itself", not "onto itself".

berbc · on Sept 5, 2016

His presentation "Transport Architectures for an Evolving Internet" could be the one you're referring to? It's online and very interesting: https://www.youtube.com/watch?v=UsCOVF0vDe8.

berbc · on Sept 5, 2015

"My word is after noon."

berbc · on May 15, 2015

Oh http://rhiever.github.io/name-age-calculator/index.html?Gend...

DanKlinton · on May 15, 2015

For some reason in 2001, Osama named stopped being popular http://rhiever.github.io/name-age-calculator/index.html?Gend... Barak in 2008 became even more popular... Soon not so much :) http://rhiever.github.io/name-age-calculator/index.html?Gend...

rmxt · on May 15, 2015

Spelling it as the president spells it -- Barack -- shows a jump from nonexistent to "something" right around 2009. I wouldn't go so far as to call it noise because it's more than just coincidence, but it's still only ~0.003% of the male population born that year. (60 births/(0.5*4,000,000 total births).