It is a ridiculous feature of modern C that you have to write the super verbose "mask and shift" code, which then gets compiled to a simple `mov` and maybe a `bswap`. Wheras, the direct equivalent in C, an assignment with a (type changing) cast, is illegal. There is a huge mismatch between the assumptions of the C spec and actual machine code.
One of the few reasons I ever even reached to C is the ability to slurp in data and reinterpret it as a struct, or the ability to reason in which registers things will show up and mix in some `asm` with my C.
I think there should really be a dialect of C(++) where the machine model is exactly the physical machine. That doesn't mean the compiler can't do optimizations, but it shouldn't do things like prove code as UB and fold everything to a no-op. (Like when you defensively compare a pointer to NULL that according to spec must not be NULL, but practically could be...)
`-fno-strict-overflow -fno-strict-aliasing -fno-delete-null-pointer-checks` gets you halfway there, but it would really only be viable if you had a blessed `-std=high-level-assembler` or `-std=friendly-c` flag.
> One of the few reasons I ever even reached to C is the ability to slurp in data and reinterpret it as a struct, or the ability to reason in which registers things will show up and mix in some `asm` with my C.
Which results in undefined behavior according to the C ISO standard.
Quote:
“2 All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.”
That "should present no problem unless binary data written by one implementation are read by another" quoth ANSI X3.159-1988. One example of a time where I've used that, is when storing intermediary build artifacts. Those artifacts only exist on the host machine. If the binary that writes/reads those artifacts gets recompiled, then the Makefile will invalidate the artifacts so they're regenerated. Since flags like -mstructure-size-boundary=n do exist and ABI breakages have happened with structs in the past.
Sensitive emotional subjects shouldn't be noted. Reminding C developers of the void* incompatibility is a good way to get them to feel triggered because it makes the language unpleasant.
> Wheras, the direct equivalent in C, an assignment with a (type changing) cast, is illegal.
I don't understand what you mean by that. The direct equivalent of what?
Endianess is not part of the type system in C so I'm not sure I follow.
> I think there should really be a dialect of C(++) where the machine model is exactly the physical machine.
Linus agrees with you here, and I disagree with both of you. Some UBs could
certainly be relaxed, but as a rule I want my code to be portable and for the
compiler to have enough leeway to correctly optimize my code for different
targets without having to tweak my code.
I want strict aliasing and I want the compiler to delete extraneous NULL pointer
checks. Strict overflow I'm willing to concede, at the very least the standard
should mandate wrap-on-overflow ever for signed integers IMO.
I am sympathetic, but portability was more important in the past and gets less important each year. I used to write code strictly keeping the difference between numeric types and sequences of bytes in mind, hoping to one day run on an Alpha or a Tandem or something, but it has been a long time since I have written code that runs on non-(Intel AMD or le ARM)
x86_32, x86_64, arm, arm64, POWER , RISC-V and several others are alive and kicking. China is making their own ISA. And there is still plenty of space and time for new ISAs to be created.
Actually, it is true - which is why endian is a problem in the first place. ASM code is different when written for little endian vs big endian. Access patterns are positively offset instead of negatively.
A language that does the same things regardless of endianness would not have pointer arithmetic. That is not ASM and not C.
You can make the preprocessor condition broader if you care about more compilers and more platforms. Yes, I'm making assumptions about which platforms you want to target... which is fine. No, I don't care about your PDP-11, nor about dynamically changing your endian at runtime. Nearly any problem in C can be made arbitrarily difficult if you care about sufficiently bizarre platforms, or ask that people write code that is correct on any theoretical conforming C implementation. So we pick some platforms to support.
The above code is fairly simple. You can separate the part where you care about unaligned memory access and the part where you care about endian.
Author here. The blog post has that as the naive example. The whole intention was to help people understand why we don't need to do that. Could you at least explain why you disagree if you're going to use this thread to provide the complete opposite advice?
Which as you correctly state in the article, is incorrect code. We agree about this. I proposed an alternate solution, where the READ32BE would be like this:
What I like about this is that it breaks the problem down into two parts: reading unaligned data and converting byte order. The reason for this is, sometimes, you need a half of that. Some wire formats have alignment guarantees, and if you know that the alignment guarantees are compatible with your platform, you can just read the data into a buffer and then (optionally) swap the bytes in place.
Just to give an example... not too long ago I was working with legacy code that was written for MIPS. Unaligned access does not work on MIPS, so the code was already carefully written to avoid that. All I had to do was make sure that the data types were sized (e.g. replace "long" with "int32_t") and then go through and byte swap everything.
So it's nice to have a function like swap32be(), and "you don't have to mask and shift" I would say is true, it just depends on which compilers you want to support. I would say that a key part of being a C programmer is making a conscious decision about which compilers you want to support.
Yes, I'm aware that structs are not a great way to serialize data in general, but sometimes they're damn convenient.
There have been CPU architectures where the endianness at compile time isn't necessarily sufficient. I forget which, maybe it was DEC Alpha, where the CPU could flip back and forth? I can't recall if it was a "choose at boot" or a per process change.
Which nothing will be able to deal with so you might as well not bother to support it. Your compiler will also assume a fixed endianness based on the target triple.
The entire problem of using byte swaps is that you need to use them when your native platform's byte order is different from that of the data you are reading.
You know the byte order of the data. But the tricky part is, what is the byte order of the platform?
It will always be correct, but you can't just assume that the compiler will optimize the shifts into a byteswap instructions. If you look at the article you will see that it tires to no-true-scotsman that concern away by talking about a "good modern compiler".
And what exactly is the problem there? Are you going to be writing code that a) is built with a weird enough compiler that it fails this optimisation but also b) does byte swapping in a performance critical section?
Of course nobody wants C to backstab them with UB, but at the same time programmers want compilers to generate optimal code. That's the market pressure that forces optimizers to be so aggressive. If you can accept less optimized code, why aren't you using tcc?
The idea of C that "just" does a straightforward machine translation breaks down almost immediately. For example, you'd want `int` to just overflow instead of being UB. But then it turns out indexing `arr[i]` can't use 64-bit memory addressing modes, because they don't overflow like a 32-bit int does. With UB it doesn't matter, but a "straightforward C" would emit unnecessary separate 32-bit mul/shift instructions.
So in your 'machine model is the physical machine' flavour, should "I cast an unaligned pointer to a byte array to int32_t and deref" on SPARC (a) do a bunch of byte-load-and-shift-and-OR or (b) emit a simple word load which segfaults? If the former, it's not what the physical machine does, and if the latter, then you still need to write the code as "some portable other thing". Which is to say that the spec's UB here is in service of "allow the compiler to just emit a word load when you write *(int32_t)p".
What I think the language is missing is a way to clearly write "this might be unaligned and/or wrong endianness, handle that". (Sometimes compilers provide intrinsics for this sort of gap, as they do with popcount and count-leading-zeroes; sometimes they recognize common open-coded idioms. But proper standardised support would be nicer.)
Endianness doesn't matter though, for the reasons Rob Pike explained. For example, the bits inside each byte have an endianness probably inside the CPU but they're not addressable so no one thinks about that. The brilliance of Rob Pike's recommendation is that it allows our code to be byte order agnostic for the same reasons our code is already bit order agnostic.
I agree about bsf/bsr/popcnt. I wish ASCII had more punctuation marks because those operations are as fundamental as xor/and/or/shl/shr/sar.
D's machine model does actually assume the hardware, and using the compile time metaprogramming you can pretty much do whatever you want when it comes to bit twiddling - whether that means assembly, flags etc.
> There is a huge mismatch between the assumptions of the C spec and actual machine code.
Right, which is why the kind of UB pedantry in the linked article is hurting and not helping. Cranky old man perspective here:
Folks: the fact that compilers will routinely exploit edge cases in undefined behavior in the language specification to miscompile obvious idiomatic code is a terrible bug in the compilers. Period. And we should address that by fixing the compilers, potentially by amending the spec if feasible.
But instead the community wants to all look smart by showing how much they understand about "UB" with blog posts and (worse) drive-by submissions to open source projects (with passive agressive sneers about code quality), so nothing gets better.
Seriously: don't tell people to shift and mask. Don't pontificate over compiler flags. Stop the masturbatory use of ubsan (though the tool itself is great). And start submitting bugs against the toolchain to get this fixed.
I agree but language of the standard very unambiguously lets them do it. Quoth X3.159-1988
* Undefined behavior --- behavior, upon use of a nonportable or
erroneous program construct, of erroneous data, or of
indeterminately-valued objects, for which the Standard imposes no
requirements. Permissible undefined behavior ranges from ignoring the
situation completely with unpredictable results, to behaving during
translation or program execution in a documented manner characteristic
of the environment (with or without the issuance of a diagnostic
message), to terminating a translation or execution (with the issuance
of a diagnostic message).
In the past compilers "behaved during translation or program execution in a documented manner characteristic of the environment" and now they've decided to "ignore the situation completely with unpredictable results". So yes what gcc and clang are doing is hostile and dangerous, but it's legal. https://justine.lol/undefined.png So let's fix our code. The blog post is intended to help people do that.
No; I say we force the compiler writers to fix their idiotic assumptions instead of bending over backwards to please what's essentially a tiny minority. There's a lot more programmers who are not compiler writers.
The standard is really a minimum bar to meet, and what's not defined by it is left to the discretion of the implementers, who should be doing their best to follow the "spirit of C", which ultimately means behaving sanely. "But the standard allows it" should never be a valid argument --- the standard allows a lot of other things, not all of which make sense.
force the compiler writers to fix their idiotic assumptions instead of bending over backwards to please what's essentially a tiny minority
As far as I understand it, they do neither. Transforming an AST to any level of target code is not done by handcrafted recipes, but instead is feeded into efficient abstract solvers which have these assumptions as an operational detail. E.g.:
p = &x;
if (p != &x) foo(); // optimized out
is not much different from
if (p == NULL) foo(); // optimized out
printf("%c", *p);
No assumption here is idiotic, cause no single human was involved, it’s just a class of constraints, which alone to separate properly you’ll have to scratch your head extensively (imagine telling a logic system that p is both 0 and not-0 when 0-test is “explicit” and asking it to normally operate). Compiler writers do not format disks just to punish your UBs. Of course you can write a boring compiler that emits opcodes at face expr value, without most UBs being a problem. Plenty of these, why not just take one?
In your example, why should it optimise out the second case? Maybe foo() changed p so it's no longer null.
Compiler writers do not format disks just to punish your UBs.
IMHO if the compiler exploiting UB is leading to counterintuitive behaviour that's making it harder to use the language, the compiler is the one that needs fixing, regardless of whether the standard allows it. "But we wrote the compiler so it can't be fixed" just feels like a "but the AI did it, not me" excuse.
The address of p could have been taken somewhere earlier and stored in a global that foo accesses, or a similar path to that; and of course, p could itself be a global. Indeed, if the purpose of foo is to make p non-null and point to valid memory, then by optimising away that code you have broken a valid program.
If the compiler doesn't know if foo may modify p, then it can't remove the call. Even if it can prove that foo does not modify p, it still can't remove the call: foo may still have some other side-effects that matter (like not returning --- either longjmp()'ing elsewhere or perhaps printing an error message about p being null and exiting?), so it won't even get to the null dereference.
As a programmer, if I write code like that, I either intend for foo to be doing something to p to make it non-null, or if it doesn't for whatever reason, then it will actually dereference the null and whatever happens when that's attempted on the particular platform, happens. One of the fundamental principles of C is "trust the programmer". In other words, by trying to be "helpful" and second-guessing the intent of the code while making assumptions about UB, the compiler has completely broken the expectations of the programmer. This is why assumptions based on UB are stupid.
The standard allows this, but the whole intent of UB is not so compiler-writers can play language-lawyer and abuse programmers; things it leaves undefined are usually because existing and possible future implementations vary so widely that they didn't even try to consider or enumerate the possibilities (unlike with "implementation-defined").
But in fact compilers do regularly prove such things as, "this function call did not touch that local variable". Escape analysis is a term related to this.
I'm more of two minds about that other step, where the compiler goes like, "here in the printf call the p will be dereferenced, so it surely is non-null, so we silently optimize that other thing out where we consider the possibility of it being null".
Also @joshuamorton, couldn't the compiler at least print a warning that it removed code based on an assumption that was inferred by the compiler? I really don't know a lot about those abstract logic solver approaches, but it feels like it should be easy to do.
warning that it removed code based on an assumption that was inferred by the compiler
That would dump a ton of warnings from various macro/meta routines, which real-world C is usually peppered with. Not that it’s particularly hard to do (at the very least compilers know which lines are missing from debug info alone).
Yes, the assumption that p is non-null is idiotic. Also, the implicit assumption that foo will always return.
> no single human was involved
Humans implemented the compilers that use the spec adversarially and humans lobby the standards committee to not fix the bugs
> Of course you can write a boring compiler that emits opcodes at face expr value, without most UBs being a problem. Plenty of these, why not just take one
The majority of optimizations are harmless and useful, only a handful are idiotic and harmful. I want a compiler that has the good optimizations and not the bad ones.
For essentially every form of UB that compilers actually take advantage of, there's a real program optimization benefit. Are there any particular UB cases where you think the benefit isn't worth it, or it should be implementation-specific behavior instead of undefined behavior?
Most performance wins from UB come from removing code that someone wrote intentionally. If that code wasn't meant to be run, it shouldn't be written. If it was written, it should be run.
Now obviously there are lots of counter-examples for that. You can probably list ten in a minute. But it should be the guiding philosophy of compiler optimizations. If the programmer wrote some code, it shouldn't just be removed. If the program would be faster without that code, the programmer should be the one responsible for deciding whether the code gets removed or not.
MSVC and ICC have traditionally been far less keen on exploiting UB, yet are extremely competitive on performance (ICC in particular). That alone is enough evidence to convince me that UB is not the performance-panacea that the gcc/clang crowd think it is, and from my experience with writing Asm, good instruction selection and scheduling is far more important than trying to pull tricks with UB.
Get the teamsters and workers world party to occupy clang. You should fork C to restore the spirit of C and call it Spiritual C since we need a new successor to Holy C.
I read this, and go "yes, yes, yes", and then "NO!".
Shifts and ors really is the sanest and simplest way to express "assembling an integer from bytes". Masking is _a_ way to deal with the current C spec which has silly promotion rules. Unsigned everything is more fundamental than signed.
> That doesn't mean the compiler can't do optimizations, but it shouldn't do things like prove code as UB and fold everything to a no-op.
UB doesn't just mean the compiler can treat it as a no-op. It means the compiler can do whatever it likes and still be compliant with the spec.
From the POV of someone consulting the spec, if something results in UB, what it means is: "Don't look here for documentation, look in the documentation of your compiler!".
Many compilers prefer to do a no-op because it is the cheapest thing to do.
My read of the standard is the worst the compiler can do, is to do nothing. For example, the blog post links a tweet where clang doing nothing meant generating an empty function so that calling it execution fell through to a different function the author had written which formats the hard drive. However it wouldn't be kosher for the compiler to generate the asm that formats your hard drive itself as an intended punishment for UB since the standard recommends, if you're not going to do nothing, then you can have the compiler act in a way that's characteristic of the environment, or you can crash with an error.
C is carful to distinguish “unspecified behavior” (every compiler must document a consistent choice) and “undefined behavior” which doesn’t necessarily have any safe uses.
One of the few reasons I ever even reached to C is the ability to slurp in data and reinterpret it as a struct, or the ability to reason in which registers things will show up and mix in some `asm` with my C.
I think there should really be a dialect of C(++) where the machine model is exactly the physical machine. That doesn't mean the compiler can't do optimizations, but it shouldn't do things like prove code as UB and fold everything to a no-op. (Like when you defensively compare a pointer to NULL that according to spec must not be NULL, but practically could be...)
`-fno-strict-overflow -fno-strict-aliasing -fno-delete-null-pointer-checks` gets you halfway there, but it would really only be viable if you had a blessed `-std=high-level-assembler` or `-std=friendly-c` flag.