Hacker Newsnew | past | comments | ask | show | jobs | submit | jcalvinowens's commentslogin

Funny your example is rc5, I wrote exactly what you describe to generate 32-bit cookies in a random prototype a few years ago: https://github.com/jcalvinowens/sdvr/blob/main/rc5.c

It is cute, but surely there's a more efficient way than RC5? There are bijective hash functions which are much cheaper (murmur, at least).


In my case, performance was utterly unimportant.

But is Murmur actually bijective?


Mine too, I was just curious.

I recall empirically determining murmur was bijective across all 32-bit inputs, but I can't find that written down anywhere.


Heh, am I the only one who was expecting an article about register renaming?

> humans only use vision to drive

I love this argument because it is so obviously wrong: how could any self aware person seriously argue that hearing, touch, and the inner ear aren't involved in their driving?

As an adult I can actually afford a reliable car, so I will concede that smell is less relevant than it used to be, at least for me personally :)


> hearing, touch, and the inner ear aren't involved

Not to mention possibly the most complex structure in the known universe, the human brain: 86 billion neurons, 100 trillion connections.


Involved? Yes. Necessary? Pretty sure no.

If it makes you happy, you can read "only vision" as "no lidar or radar." Cars already have microphones and IMUs.


1. in US you can get a driver's license if you're deaf so as a society we think you can drive without hearing

2. since this is in context of Tesla: tesla cars do have microphones and FSD does use it for responding to sirens etc.


(1) is true, but actually driving is definitely harder without hearing or with diminished hearing. And Several US states, including CA, prohibit inhibiting hearing while driving, e.g., by wearing a headset, earbuds, or earplugs.

Human inner ear is worse than a $3 IMU in your average smartphone in literally every way. And that IMU also has a magnetometer in it.

Beating human sensors wasn't hard for over a decade now. The problem is that sensors are worthless. Self-driving lives and dies by AI - all the sensors need to be is "good enough".


Human hearing is excellent. Good directional perception and sensitivity. Eyesight is the weakest sense. Poor color sensitivity, low light sensitivity, blindspot. The terrible natural design flaws are compensated by natural nystagmas and the brain filling in the blanks.

> The problem is that sensors are worthless

Well, in TFA the far more successful manufacturer of self driving cars is saying you're wrong. I think they're in much better position to know than you :)


If you think this is overengineered, I built one that will really offend you: https://github.com/jcalvinowens/wallclock :)

The point is to have fun and learn something, not really to solve a problem in a practical sense. The radio controlled clocks are extremely unreliable where I live.


How much of this result is effectively plagiarized open source compiler code? I don't understand how this is compelling at all: obviously it can regurgitate things that are nearly identical in capability to already existing code it was explicitly trained on...

It's very telling how all these examples are all "look, we made it recreate a shitter version of a thing that already exists in the training set".


The fact it couldn't actually stick to the 16 bit ABI so it had to cheat and call out to GCC to get the system to boot says a lot.

Without enough examples to copy from (despite CPU manuals being available in the training set) the approach failed. I wonder how well it'll do when you throw it a new/imaginary instruction set/CPU architecture; I bet it'll fail in similar ways.


"Couldn't stick to the ABI ... despite CPU manuals being available" is a bizarre interpretation. What the article describes is the generated code being too large. That's an optimization problem, not a "couldn't follow the documentation" problem.

And it's a bit of a nasty optimization problem, because the result is all or nothing. Implementing enough optimizations to get from 60kB to 33kB is useless, all the rewards come from getting to 32kB.


IMHO a new architecture doesn't really make it any more interesting: there's too many examples of adding new architectures in the existing codebases. Maybe if the new machine had some bizarre novel property, I suppose, but I can't come up with a good example.

If the model were retrained without any of the existing compilers/toolchains in its training set, and it could still do something like this, that would be very compelling to me.


What Rust-based compiler is it plagiarising from?

Language doesn't really matter, it's not how things are mapped in the latent space. It only needs to know how to do it in one language.

Ok you can say this about literally any compiler though. The authors of every compiler have intimate knowledge of other compilers, how is this different?

grace hopper spinning in her grave rn


Did you actually look at these?

> https://github.com/jyn514/saltwater

This is just a frontend. It uses Cranelift as the backend. It's missing some fairly basic language features like bitfields and variadic functions. And if I'm reading the documentation right, it requires all the source code to be in a single file...

> https://github.com/ClementTsang/rustcc

This will compile basically no real-world code. The only supported data type is "int".

> https://github.com/maekawatoshiki/rucc

This is just a frontend. It uses LLVM as the backend.


Look at what those compilers are capable of compiling and to which targets, and compare it to what this compiler can do. Those are wonderful, and I have nothing but respect for them, but they aren't going to be compiling the Linux kernel.

I just did a quick Google search only on GitHub, maybe there are better ones out there on the internet?


Can't compile the Linux kernel, and ironically, also partly written by Claude.


A genuinely impressive effort, but alas, still missing some pretty critical features (const, floating point, bools, inline, anonymous structs in function args).

Being written in rust is meaningless IMHO. There is absolutely zero inherent value to something being written in rust. Sometimes it's the right tool for the job, sometimes it isn't.

It means that it's not directly copying existing C compiler code which is overwhelmingly not written in Rust. Even if your argument is that it is plagiarizing C code and doing a direct translation to Rust, that's a pretty interesting capability for it to have.

Translating things between languages is probably one of the least interesting capabilities of LLMs - it's the one thing that they're pretty much meant to do well by design.

Surely you agree that directly copying existing code into a different language is still plagiarism?

I completely agree that "reweite this existing codebase into a new language" could be a very powerful tool. But the article is making much bolder claims. And the result was more limited in capability, so you can't even really claim they've achieved the rewrite skill yet.


Please don't open a bridge to the Rust flamewar from the AI flamewar :-)

Hahaha, fair enough, but I refuse to be shy about having this opinion :)

Honestly, probably not a lot. Not that many C compilers are compatible with all of GCC's weird features, and the ones that are, I don't think are written in Rust. Hell, even clang couldn't compile the Linux kernel until ~10 years ago. This is a very impressive project.

  Location: Bay Area, CA, USA
  Remote: Yes
  Willing to relocate: No
  Technologies: C, C++, Linux, drivers, embedded, HPC, networking, video, radio, yocto
  Résumé/CV: https://github.com/jcalvinowens/misc/blob/main/resume/resume.pdf
  Email: calvin@wbinvd.org
I solve technical problems in exchange for monetary compensation. I do a little bit of everything: https://github.com/jcalvinowens

I currently have 20 hours/week available. I'm not considering full time roles at this time, only contract work. Thanks.


  mkdir chroot
  cd chroot
  for lib in $(ldd ${executable} | grep -oE '/\S+'); do
    tgt="$(dirname ${lib})"
    mkdir -p .${tgt}
    cp ${lib} .${tgt}
  done
  mkdir -p .$(dirname ${executable})
  cp ${executable} .${executable}
  tar cf ../chroot-run-anywhere.tgz .


You're supposed to do this recursively for all the libs no?

Eg. Your App might just depend on libqt5gui.so but that libqt5gui.so might depend on some libxml etc...

Not to mention all the files from /usr/share etc... That your application might indirectly depend on.


> You're supposed to do this recursively

ldd works recursively.

> Not to mention all the files from /usr/share

Well yeah, there obviously cannot be a generic way to enumerate all the files a program might open...


  #if CHAR_BIT != 8
   #error "CHAR_BIT != 8"
  #endif
In modern C you can use static_assert to make this a bit nicer.

  static_assert(CHAR_BIT == 8, "CHAR_BIT is not 8");
...although it would be a bit of a shame IMHO to add that reflexively in code that doesn't necessarily require it.

https://en.cppreference.com/w/c/language/_Static_assert.html


Even if the code might not end up requiring it, if you write it with the assumption that bytes are 8 bits, it's good to document that with a static assert so someone porting things knows there will be dragons

It's a pretty neat way to drop some corner cases from your mental load without building subtle traps


That's pretty silly IMHO, it should be incredibly obvious to anybody who is ever in a position to port code to a machine with non-8-bit-bytes that there will be dragons there. It also requires including limit.h which you might not otherwise need.

It's just not a realistic edge case, the machines like this are either antiquated or are tiny microcontrollers that can't practically run a POSIX OS. Very little code in the real world is generic enough to be useful in that environment (a good example might be a fixed point signal processing library).

There is no assertion in the entire Linux kernel that CHAR_BIT is eight, despite that assumption being hardcoded in many places.


Gtav


> Apple's solution is iCloud Keychain which is E2E encrypted, so would not be revealed with a court order.

Nope. For this threat model, E2E is a complete joke when both E's are controlled by the third party. Apple could be compelled by the government to insert code in the client to upload your decrypted data to another endpoint they control, and you'd never know.


That was tested in the San Bernardino shooter case. Apple stood up and the FBI backed down.


It's incredibly naive to believe apple will continue to be able to do that.


Yeah and Microsoft could insert code to upload the bitlocker keys. What's your point? Even linux could do that if they were compelled to.


> Even linux could do that if they were compelled to.

An open source project absolutely cannot do that without your consent if you build your client from the source. That's my point.


This is a wildly unrealistic viewpoint. This would assume that you somehow know the language of the client you’re building and have total knowledge over the entire codebase and can easily spot any sort of security issues or backdoors, assuming you’re using software that you yourself didn’t make (and even then).

This also completely disregards the history of vulnerability incidents like XZ Utils, the infected NPM packages of the month, and even for example CVEs that have been found to exist in Linux (a project with thousands of people working on it) for over a decade.


You're conflating two orthogonal threat models here.

Threat model A: I want to be secure against a government agency in my country using the ordinary judicial process to order engineers employed in my country to make technical modifications to products I use in order to spy on me specifically. Predicated on the (untrue in my personal case) idea that my life will be endangered if the government obtains my data.

Threat model B: I want to be secure against all nation state actors in the world who might ever try to surreptitiously backdoor any open source project that has ever existed.

I'm talking about threat model A. You're describing threat model B, and I don't disagree with you that fighting that is more or less futile.

Many open source projects are controlled by people who do not live in the US and are not US citizens. Someone in the US is completely immune to threat model A when they use those open source projects and build them directly from the source.


Wait I'm sorry do you build linux from source and review all code changes?


You missed the important part:

> For this threat model

We're talking about a hypothetical scenario where a state actor getting the information encrypted by the E2E encryption puts your life or freedom in danger.

If that's you, yes, you absolutely shouldn't trust US corporations, and you should absolutely be auditing the source code. I seriously doubt that's you though, and it's certainly not me.

The sub-title from the original forbes article (linked in the first paragraph of TFA):

> But companies like Apple and Meta set up their systems so such a privacy violation isn’t possible.

...is completely utterly false. The journalist swallowed the marketing whole.


Okay, so yes I grant your point that people where governments are the threat model should be auditing source code.

I also grant that many things are possible (where the journalist says "isn't possible").

However, what remains true is that Microsoft appears to store this data in a manner that can be retrieved through "simple" warrants and legal processes, compared to Apple where these encryption keys are stored in a manner that would require code changes to accomplish.

These are fundamentally different in a legal framework and while it doesn't make Apple the most perfect amazing company ever, it shames Microsoft for not putting in the technical work to accomplish these basic barriers to retrieving data.


> retrieved through "simple" warrants and legal processes

The fact it requires an additional engineering step is not an impediment. The courts could not care less about the implementation details.

> compared to Apple where these encryption keys are stored in a manner that would require code changes to accomplish.

That code already exists at apple: the automated CSAM reporting apple does subverts their icloud E2E encryption. I'm not saying they shouldn't be doing that, it's just proof they can and already do effectively bypass their own E2E encryption.

A pedant might say "well that code only runs on the device, so it doesn't really bypass E2E". What that misses is that the code running on the device is under the complete and sole control of apple, not the device's owner. That code can do anything apple cares to make it do (or is ordered to do) with the decrypted data, including exfiltrating it, and the owner will never know.


> The courts could not care less about the implementation details

That's not really true in practice by all public evidence

> the automated CSAM reporting apple does

Apple does not have a CSAM reporting feature that scans photo libraries, it never rolled out. They only have a feature that can blur sexual content in Messages and warn the reader before viewing.

We can argue all day about this, but yeah - I guess it's true that your phone is closed source so literally everything you do is "under the complete and sole control of Apple."

That just sends you back to the first point and we can never win an argument if we disagree about the level the government might compel a company to produce data.


> they serve ULAs on LAN and do nat6 to a single public v6 address

I've never seen this and I'm curious: do they actually pick a random /48 out of fd00::/8 like they're supposed to?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: