I'm reminded by the caveman skill of the clipped writing style used in telegrams, and your post further reminded me of "standard" books of telegram abbreviations. Take a look at [0]; could we train models to use this kind of code and then decode it in the browser? These are "rich" tokens (they succinctly carry a lot of information).
I would point out that the default BPE tokenization vocabulary used by many models (cl100k_base) is already a pretty powerful shorthand. It has a lot of short tokens, sure. But then:
Token ID 73700 is the literal entire (space-prefixed) word " strawberry". (Which neatly explains the "strawberry problem.")
Token ID 27128 is " cryptocurrency". (And 41698 is " disappointment".)
Token ID 44078 is " UnsupportedOperationException"!
Token ID 58040 is 128 spaces in a row (and is the longest token in the vocabulary.)
You'd be surprised how well this vocabulary can compress English prose — especially prose interspersed with code!
For a while I was missing the ability one uses all the time in stable diffusion prompts of using parentheses and floats to emphasize weight to different parts of the prompt. The more I thought about how it would work in an LLM though, the more I realized it's just reinventing code syntax and you could just give a code snippet to the LLM prompt.
When you're killing (C-u, C-k, C-w, etc) + yanking (C-y), you can also use yank-pop (bound to M-y in bash and zsh by default) to replace the thing you just yanked with the thing you had killed before it.
$ asdf<C-w>
$ # now kill ring is ["asdf"]
$ qwerty<C-a><C-k>
$ # now kill ring is ["qwerty", "asdf"]
$ <C-y> # "yank", pastes the thing at the top of the kill ring
$ qwerty<M-y> # "yank-pop", replaces the thing just yanked with the next
# thing on the ring, and rotates the ring until the next yank
$ asdf
If you have enough discipline to make sure you only create threads after all the forking is done, then sure. But having such discipline is harder than just forbidding fork or forbidding threads in your program. It turns a careful analysis of timing and causality into just banning a few functions.
And what do you do with that information? Refuse to fork after you detect more than one thread running? I haven’t seen any code that gracefully handles the unable-to-fork scenario. When people write fork-based code, especially in Python, they always expect forking to succeed.
But not the reverse, if its a bare fork and not strictly using basically mutex and shared resource free code (which is hard), and there's little or no warning lights to indicate that this is a terrible idea that fails in really unpredictable and hard to debug ways.
> The main downside of this was memory usage. You would have to load all of your code and libraries N types and in-process caches would become less effective.
You can load modules and then fork child processes. Children will share memory with each other (if they need to modify any shared memory, they get copy-on-write pages allocated by the kernel) and you'll save quite a lot on memory.
Yes, this can help a lot, but it definitely isn't perfect. Especially since CPython uses reference counting it is likely that many pages get modified relatively quickly as they are accessed. Many other GC strategies are also pretty hostile to CoW memory (for example mark bits, moving, ...) Additionally this doesn't help for lazy loaded data and caches in code and libraries.
I agree with you that they're more or less equal. I don't like the idea of my reverse proxy dealing with letsencrypt for me, personally, but that's just a preference.
One tricky thing about nginx though, from the "If is evil" nginx wiki [0]:
> The if directive is part of the rewrite module which evaluates instructions imperatively. On the other hand, NGINX configuration in general is declarative. At some point due to user demand, an attempt was made to enable some non-rewrite directives inside if, and this led to the situation we have now.
I use nginx for homelab things because my use-cases are simple, but I've run into issues at work with nginx in the past because of the above.
I'm not sure why Apache is so unpopular, it can also function as a reverse proxy and doesn't have the weird configuration issues nginx has.
Some people take this way too far, for instance I've send places compiling (end of life) modsec support into nginx instead of using the webserver it was built for
I got a 4-pack of zigbee power plugs that report usage, and I have a home assistant automation that goes ding (or whatever) when the washer or dryer had been using electricity for at least a few minutes and then stops using electricity.
[0] https://books.google.com/books?id=VO4OAAAAYAAJ&pg=PA464#v=on...
reply