Hacker Newsnew | past | comments | ask | show | jobs | submit | gopalv's commentslogin

The linked clang PR is also very readable.

https://github.com/llvm/llvm-project/pull/181288/files

As the PR clearly points out, you can do this in a register but not inside vectors.

I don't think fastdiv has had an update in years, which what I've used because compilers can't do "this is a constant for the next loop of 1024" like columnar sql needs.


> Multiplication alone requires depth-8 trees with 41+ leaves i.e. minimal operator vocabulary trades off against expression length.

That is sort of comparable to how NAND simplify scaling.

Division is hell on gates.

The single component was the reason scaling went like it did.

There was only one gate structure which had to improve to make chips smaller - if a chip used 3 different kinds, then the scaling would've required more than one parallel innovation to go (sort of like how LED lighting had to wait for blue).

If you need two or more components, then you have to keep switching tools instead of hammer, hammer, hammer.


I'm not sure what you mean by this? It's true that any Boolean operation can be expressed in terms of two-input NAND gates, but that's almost never how real IC designers work. A typical standard cell library has lots of primitives, including all common gates and up to entire flip-flops and RAMs, each individually optimized at a transistor level. Realization with NAND2 and nothing else would be possible, but much less efficient.

Efficient numerical libraries likewise contain lots of redundancy. For example, sqrt(x) is mathematically equivalent to pow(x, 0.5), but sqrt(x) is still typically provided separately and faster. Anyone who thinks that eml() function is supposed to lead directly to more efficient computation has missed the point of this (interesting) work.


Yeah, what you're going to get is more efficient proofs: you can do induction on one case to get results about elementary functions. Not sure where anyone's getting computational efficiency thoughts from this.

Are you under the impression that CPUs are made exclusively from NAND gates? You can't be serious.

Might’ve gotten mixed up with CMOS dominance, or I’m ignorant.

https://en.wikipedia.org/wiki/Mead%E2%80%93Conway_VLSI_chip_...

I'm guessing is what they're really talking about. Which is not about NAND gates.


Just to add a bit, but modern digital circuits are almost exclusively MOS, but even the "complementary" bit isn't universal in a large IC.

I believe you're not ignorant. But many folks probably lack the process knowledge (CMOS) required to understand why :-)

The first part of the parabellum quote matters - we have to let the people who want peace prepare for war.

The Smedly Butler book was eye opening to read for me.

Diplomacy and trade works wonders when the enemy still wants you to buy things.

Sanctions work when they've got things to sell (and raw materials to buy), not bombed out craters where their factories were.

Si vis pacem ...


aposiopesis is followed presumably by some latin phrasing of prepare for war?

[edit, found the real version https://en.wikipedia.org/wiki/Si_vis_pacem%2C_para_bellum ]

adapted from a statement found in Roman author Publius Flavius Vegetius Renatus's tract De Re Militari (fourth or fifth century AD), in which the actual phrasing is Igitur qui desiderat pacem, præparet bellum ("Therefore let him who desires peace prepare for war").


>> It replicates data across multiple, independent DRAM channels with uncorrelated refresh schedules

This is the sort of thing which was done before in a world where there was NUMA, but that is easy. Just task-set and mbind your way around it to keep your copies in both places.

The crazy part of what she's done is how to determine that the two copies don't get get hit by refresh cycles at the same time.

Particularly by experimenting on something proprietary like Graviton.


She determines that by having three copies. Or four. Or eight.

Tis just probabilities and unlikelihood of hitting a refresh cycle across that many memory channels all at once.


Right, but the impressive part is finding addresses that are actually on different memory channels.

Surprising to me that two memory channels are separated by as little as 256 bytes. The short distance makes it easier to find, surely?

Access optimization or interleaving at a lower level than linearly mapping DIMMs and channels. x86 cache lane size is 64 bytes, so it must be a multiple. Probably 64*2^n bytes.

"This is the sort of thing which was done before in a world where there was NUMA"

You sound like NUMA was dead, is this a bit of hyperbole or would really say there is no NUMA anymore. Honest question because I am out if touch.


EPYC chips have multiple levels of NUMA - one across CCDs on the one chip, and another between chips in different motherboard sockets. As a user under Linux you can treat it as if it was simple SMP, but you’ll get quite a bit less performance.

Home PCs don’t do NUMA as much anymore because of the number of cores and threads you can get on one core complex. The technology certainly still exists and is still relevant.


> Surely those are at least an order of magnitude larger than Tolkien's prose and might still benefit from a RAG.

At some point, this is a distributed system of agents.

Once you go from 1 to 3 agents (1 router and two memory agents), it slowly ends up becoming a performance and cost decision rather than a recall problem.


> I don't understand how taking a series of data and applying a random rotation could mathemetically lead every time to "simpler" geometry.

Let's pick a simpler compression problem where changing the frame of reference improves packing.

There's a neat trick in the context of floating point numbers.

The values do not always compress when they are stored exactly as given.

[0.1, 0.2, 0.3, 0.4, 0.5]

Maybe I can encode them in 15 bytes instead of 20 as float32.

Up the frame of reference to be decibels instead of bels and we can encode them as sequential values without storing exponent or sign again.

Changing the frame of reference, makes the numbers "more alike" than they were originally.

But how do you pick a good frame of reference is all heuristics and optimization gradients.


> c^2 is a big number.

Famous tweet about conversations with God.

[1] - https://x.com/WraithLaFrentz/status/1981404849305686219


Except the fine structure constant


> Increased speed only gets us where we want to be sooner if we are also heading in the right direction.

This is a real problem when the "direction" == "good feedback" from a customer standpoint.

Before we had a product person for every ~20 people generating code and now we're all product people, the machines are writing the code (not all of it, but enough of it that I will -1 a ~4000 line PR and ask someone to start over, instead of digging out of the hole in the same PR).

Feedback takes time on the system by real users to come back to the product team.

You need a PID like smoothing curve over your feature changes.

Like you said, Speed isn't velocity.

Specifically if you have a decent experiment framework to keep this disclosure progressive in the customer base, going the wrong direction isn't a huge penalty as it used to be.

I liked the PostHog newsletter about the "Hidden dangers of shipping fast", I can't find a good direct link to it.



Thanks! Great link.


Don't wait for feedback from "real users", become a user!

This tayloristic idea (which has now reincarnated in "design thinking") that you can observe someone doing a job and then decide better than them what they need is ridiculous and should die.

Good products are built by the people who use the thing themselves. Doesn't mean though that choosing good features (product design and engineering) isn't a skill in itself.


Too often that isn't possible. There is a lot of domain knowledge in making a widget there is a lot of domain knowledge in doing a job. when e complex job needs a complex widget often there isn't enough overlap to be experts in both.

sure 'everyone' drives so you can be a domain expert in cars. However not everyone can be an astronaught - rockets are complex enough to need more people than astronaughts and so most people designing spaceships will never have the opportunity to use one.


I find that this argument is used too often to refrain from using your own product.

Yes you're right not anyone can be a domain expert. But anyone in the company needs to at least try to use the product as much as possible.

I worked in companies where even the CEO had never used the product but was telling us what to implement.


I am not asking anybody to be an expert in both (although I am sure such people exist, however rare); I am saying people should ideally have some skill in both. Also, people can collaborate, and learn new skills.

If you're bottle-necked by waiting for the users of your product to give a feedback, you clearly need to spend more time learning how to be a user yourself. Or hire people with some domain skill who can also code.


Have been there, we got pushback from users and we had to back off with releases. Users hunted product owner with pitchforks and torches.

As dev team we were able to crank the speed even more and silly product people thought they are doing something good by demanding even more from us. But that was one of the instances where users were helpful :).

People use dozens of apps every day to do their work. Just think about how are you going to make time to give feedback to each of each.


> Just think about how are you going to make time to give feedback to each of each.

That's pretty much solved by the size of the audiences. You won't give feedback on 12 apps, but 11 other people will probably do so on 11 different apps.

Of course, the issue with my domain is that there's plenty of feedback, and product owners just dismiss it. Burn down your entire portfolio to get that boosted shareholder value for the next earnings report.


And how do you solve that when you are one of those 11 apps when no one wants to talk to you because they have their work to do? Where you don’t have power to say that kind of thing.

Well by asking repeatedly of course but you just piss people off.

Have you ever given feedback to Atlassian, Google, Microsoft?


Chrome runs Gemini Nano if you flip a few feature flags on [1].

The model is not great, but it was the "least amount of setup" LLM I could run on someone else's machine.

Including structured output, but has a tiny context window I could use.

[1] - https://notmysock.org/code/voice-gemini-prompt.html


> The writing isn’t the problem. The problem is that when I’m done, I look at what I just wrote and think this is definitely not good enough to publish.

Ira Glass has a nice quote which is worth printing out and hanging on your wall

Nobody tells this to people who are beginners, I wish someone told me. All of us who do creative work, we get into it because we have good taste. But there is this gap. For the first couple years you make stuff, it’s just not that good. It’s trying to be good, it has potential, but it’s not. But your taste, the thing that got you into the game, is still killer. And your taste is why your work disappoints you. A lot of people never get past this phase, they quit. Most people I know who do interesting, creative work went through years of this. We know our work doesn’t have this special thing that we want it to have. We all go through this. And if you are just starting out or you are still in this phase, you gotta know its normal and the most important thing you can do is do a lot of work.

Or if you're into design thinking, the Cult-of-Done[1] was a decade ago.

[1] - https://medium.com/@bre/the-cult-of-done-manifesto-724ca1c2f...


That's the exact opposite of OP's issue, right? He was producing, and it was good, but somewhere along the way he developed good taste (or some facsimile). Ira is claiming that people who are creative beginners start with good taste, which doesn't seem to be the case for a lot of us.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: