Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is a ridiculous feature of modern C that you have to write the super verbose "mask and shift" code, which then gets compiled to a simple `mov` and maybe a `bswap`. Wheras, the direct equivalent in C, an assignment with a (type changing) cast, is illegal. There is a huge mismatch between the assumptions of the C spec and actual machine code.

One of the few reasons I ever even reached to C is the ability to slurp in data and reinterpret it as a struct, or the ability to reason in which registers things will show up and mix in some `asm` with my C.

I think there should really be a dialect of C(++) where the machine model is exactly the physical machine. That doesn't mean the compiler can't do optimizations, but it shouldn't do things like prove code as UB and fold everything to a no-op. (Like when you defensively compare a pointer to NULL that according to spec must not be NULL, but practically could be...)

`-fno-strict-overflow -fno-strict-aliasing -fno-delete-null-pointer-checks` gets you halfway there, but it would really only be viable if you had a blessed `-std=high-level-assembler` or `-std=friendly-c` flag.



> One of the few reasons I ever even reached to C is the ability to slurp in data and reinterpret it as a struct, or the ability to reason in which registers things will show up and mix in some `asm` with my C.

Which results in undefined behavior according to the C ISO standard.

Quote:

“2 All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.”

From: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf 6.2.7


How? I mean, doesn't GP mean this?

    struct whatever p;
    fread(p, sizeof(p), 1, fp);


It should be perfectly fine to do this:

  union reinterpret {
    char raw[100];
    struct myStruct interpreted;
  } example;

  read(fd, &example.raw)
  struct myStruct dest = interpreted;

This is standard-compliant C code, and it is a common way of reading IP addresses from packets, for example.


You don't even need to pun that. It's legal to say:

    struct myStruct example;
    read(fd, &example, sizeof(example));
That "should present no problem unless binary data written by one implementation are read by another" quoth ANSI X3.159-1988. One example of a time where I've used that, is when storing intermediary build artifacts. Those artifacts only exist on the host machine. If the binary that writes/reads those artifacts gets recompiled, then the Makefile will invalidate the artifacts so they're regenerated. Since flags like -mstructure-size-boundary=n do exist and ABI breakages have happened with structs in the past.


(It should be noted that this is not valid C++ code.)


Sensitive emotional subjects shouldn't be noted. Reminding C developers of the void* incompatibility is a good way to get them to feel triggered because it makes the language unpleasant.


Exactly.


> Wheras, the direct equivalent in C, an assignment with a (type changing) cast, is illegal.

I don't understand what you mean by that. The direct equivalent of what? Endianess is not part of the type system in C so I'm not sure I follow.

> I think there should really be a dialect of C(++) where the machine model is exactly the physical machine.

Linus agrees with you here, and I disagree with both of you. Some UBs could certainly be relaxed, but as a rule I want my code to be portable and for the compiler to have enough leeway to correctly optimize my code for different targets without having to tweak my code.

I want strict aliasing and I want the compiler to delete extraneous NULL pointer checks. Strict overflow I'm willing to concede, at the very least the standard should mandate wrap-on-overflow ever for signed integers IMO.


I am sympathetic, but portability was more important in the past and gets less important each year. I used to write code strictly keeping the difference between numeric types and sequences of bytes in mind, hoping to one day run on an Alpha or a Tandem or something, but it has been a long time since I have written code that runs on non-(Intel AMD or le ARM)


x86_32, x86_64, arm, arm64, POWER , RISC-V and several others are alive and kicking. China is making their own ISA. And there is still plenty of space and time for new ISAs to be created.

Portability is still plenty relevant.


> I think there should really be a dialect of C(++) where the machine model is exactly the physical machine.

Sounds great, until you have to rewrite all your software to go from x86-64 to ARM


Quite common when coding games back in the 8 and 16 bit days. :)

However for the case in hand, it would suffice to just write the key routines in Assembly, not everything.


> There is a huge mismatch between the assumptions of the C spec and actual machine code.

People like to say „C is close to the metal“. Really not true at all anymore.


Actually, it is true - which is why endian is a problem in the first place. ASM code is different when written for little endian vs big endian. Access patterns are positively offset instead of negatively.

A language that does the same things regardless of endianness would not have pointer arithmetic. That is not ASM and not C.


You don’t have to mask and shift. You can memcpy and then byte swap in a function. It will get inlined as mov/bswap.

Practically speaking, common compilers have intrinsics for bswap. The memcpy function can be thought of as an intrinsic for unaligned load/store.


How do you detect if a byte swap is needed? I.e. wether the (fixed) wire endianness matches the current platform endianness?


Using the preprocessor, something like this:

    uint32_t swap32(uint32_t x) { ... }

    #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
    uint32_t swap32be(uint32_t x) { return swap32(x); }
    #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
    uint32_t swap32be(uint32_t x) { return x; }
    #else
    #error "Unknown endian"
    #endif
You can make the preprocessor condition broader if you care about more compilers and more platforms. Yes, I'm making assumptions about which platforms you want to target... which is fine. No, I don't care about your PDP-11, nor about dynamically changing your endian at runtime. Nearly any problem in C can be made arbitrarily difficult if you care about sufficiently bizarre platforms, or ask that people write code that is correct on any theoretical conforming C implementation. So we pick some platforms to support.

The above code is fairly simple. You can separate the part where you care about unaligned memory access and the part where you care about endian.

Some irrelevant details left out above.


Author here. The blog post has that as the naive example. The whole intention was to help people understand why we don't need to do that. Could you at least explain why you disagree if you're going to use this thread to provide the complete opposite advice?


When I read the blog post I saw this,

    #define READ32BE(p) bswap_32(*(uint32_t *)(p))
Which as you correctly state in the article, is incorrect code. We agree about this. I proposed an alternate solution, where the READ32BE would be like this:

    uint32_t read32be(const void *ptr) {
        uint32_t x;
        memcpy(&x, ptr, sizeof(x));
        return swap32be(x); // Nop on big-endian.
    }
What I like about this is that it breaks the problem down into two parts: reading unaligned data and converting byte order. The reason for this is, sometimes, you need a half of that. Some wire formats have alignment guarantees, and if you know that the alignment guarantees are compatible with your platform, you can just read the data into a buffer and then (optionally) swap the bytes in place.

Just to give an example... not too long ago I was working with legacy code that was written for MIPS. Unaligned access does not work on MIPS, so the code was already carefully written to avoid that. All I had to do was make sure that the data types were sized (e.g. replace "long" with "int32_t") and then go through and byte swap everything.

    struct Something {
        int32_t x, y;
        char name[16];
    };

    void Something_Swap(struct Something *p) {
        p->x = swap32be(p->x);
        p->y = swap32be(p->y);
    }
So it's nice to have a function like swap32be(), and "you don't have to mask and shift" I would say is true, it just depends on which compilers you want to support. I would say that a key part of being a C programmer is making a conscious decision about which compilers you want to support.

Yes, I'm aware that structs are not a great way to serialize data in general, but sometimes they're damn convenient.


Ie how do you know the target's endianness? C++20 added std::endian. Otherwise you can use a macro like this one from SDL

https://github.com/libsdl-org/SDL/blob/9dc97afa7190aca5bdf92...


There have been CPU architectures where the endianness at compile time isn't necessarily sufficient. I forget which, maybe it was DEC Alpha, where the CPU could flip back and forth? I can't recall if it was a "choose at boot" or a per process change.


ARM allows dynamic changing of endianess[1].

[1]: https://developer.arm.com/documentation/dui0489/h/arm-and-th...


Which nothing will be able to deal with so you might as well not bother to support it. Your compiler will also assume a fixed endianness based on the target triple.


When do you byte swap?


24 hours a day, man. I'm always byte swapping.

(I'm not sure how to answer the question... what do you mean, "when?")


The entire problem of using byte swaps is that you need to use them when your native platform's byte order is different from that of the data you are reading.

You know the byte order of the data. But the tricky part is, what is the byte order of the platform?


Whether it is tricky depends on what platforms you care about.


Or, you can just follow the advice of the article, and not need to worry about it because the compiler takes care of it for you.


> because the compiler takes care of it for you.

It will always be correct, but you can't just assume that the compiler will optimize the shifts into a byteswap instructions. If you look at the article you will see that it tires to no-true-scotsman that concern away by talking about a "good modern compiler".


And what exactly is the problem there? Are you going to be writing code that a) is built with a weird enough compiler that it fails this optimisation but also b) does byte swapping in a performance critical section?


Of course nobody wants C to backstab them with UB, but at the same time programmers want compilers to generate optimal code. That's the market pressure that forces optimizers to be so aggressive. If you can accept less optimized code, why aren't you using tcc?

The idea of C that "just" does a straightforward machine translation breaks down almost immediately. For example, you'd want `int` to just overflow instead of being UB. But then it turns out indexing `arr[i]` can't use 64-bit memory addressing modes, because they don't overflow like a 32-bit int does. With UB it doesn't matter, but a "straightforward C" would emit unnecessary separate 32-bit mul/shift instructions.

https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759...


> nobody wants C to backstab them with UB, but at the same time programmers want compilers to generate optimal code

The value of compiler optimization isn't the same thing as the value of having extensive undefined behaviour in a programming language.

Rust and Ada perform about the same as C, but lack C's many footguns.

> indexing `arr[i]` can't use 64-bit memory addressing modes

What do you mean here?


Typically, the assembly instruction that would do the read in arr[i] can do something like:

    x = *(y + z);
where y and z are both 64-bit integers. If I had

    int arr[1000];
    initialize(&arr);
    int i = read_int();
    int x = arr[i];
    print(x);
then to get x I'd need to do something like,

    tmp = i * 4;
    tmp1 = (uint64_t)tmp;
    x = *(arr + tmp1);
Which, since i is signed, can't just be a cheap shift, and then needs to be upcasted to a uint64_t (which is cheap, at least).


So in your 'machine model is the physical machine' flavour, should "I cast an unaligned pointer to a byte array to int32_t and deref" on SPARC (a) do a bunch of byte-load-and-shift-and-OR or (b) emit a simple word load which segfaults? If the former, it's not what the physical machine does, and if the latter, then you still need to write the code as "some portable other thing". Which is to say that the spec's UB here is in service of "allow the compiler to just emit a word load when you write *(int32_t)p".

What I think the language is missing is a way to clearly write "this might be unaligned and/or wrong endianness, handle that". (Sometimes compilers provide intrinsics for this sort of gap, as they do with popcount and count-leading-zeroes; sometimes they recognize common open-coded idioms. But proper standardised support would be nicer.)


Endianness doesn't matter though, for the reasons Rob Pike explained. For example, the bits inside each byte have an endianness probably inside the CPU but they're not addressable so no one thinks about that. The brilliance of Rob Pike's recommendation is that it allows our code to be byte order agnostic for the same reasons our code is already bit order agnostic.

I agree about bsf/bsr/popcnt. I wish ASCII had more punctuation marks because those operations are as fundamental as xor/and/or/shl/shr/sar.


D's machine model does actually assume the hardware, and using the compile time metaprogramming you can pretty much do whatever you want when it comes to bit twiddling - whether that means assembly, flags etc.


I suspect you might like C--.

https://en.m.wikipedia.org/wiki/C--


> There is a huge mismatch between the assumptions of the C spec and actual machine code.

Right, which is why the kind of UB pedantry in the linked article is hurting and not helping. Cranky old man perspective here:

Folks: the fact that compilers will routinely exploit edge cases in undefined behavior in the language specification to miscompile obvious idiomatic code is a terrible bug in the compilers. Period. And we should address that by fixing the compilers, potentially by amending the spec if feasible.

But instead the community wants to all look smart by showing how much they understand about "UB" with blog posts and (worse) drive-by submissions to open source projects (with passive agressive sneers about code quality), so nothing gets better.

Seriously: don't tell people to shift and mask. Don't pontificate over compiler flags. Stop the masturbatory use of ubsan (though the tool itself is great). And start submitting bugs against the toolchain to get this fixed.


I agree but language of the standard very unambiguously lets them do it. Quoth X3.159-1988

     * Undefined behavior --- behavior, upon use of a nonportable or
       erroneous program construct, of erroneous data, or of
       indeterminately-valued objects, for which the Standard imposes no
       requirements.  Permissible undefined behavior ranges from ignoring the
       situation completely with unpredictable results, to behaving during
       translation or program execution in a documented manner characteristic
       of the environment (with or without the issuance of a diagnostic
       message), to terminating a translation or execution (with the issuance
       of a diagnostic message).
In the past compilers "behaved during translation or program execution in a documented manner characteristic of the environment" and now they've decided to "ignore the situation completely with unpredictable results". So yes what gcc and clang are doing is hostile and dangerous, but it's legal. https://justine.lol/undefined.png So let's fix our code. The blog post is intended to help people do that.


So let's fix our code.

No; I say we force the compiler writers to fix their idiotic assumptions instead of bending over backwards to please what's essentially a tiny minority. There's a lot more programmers who are not compiler writers.

The standard is really a minimum bar to meet, and what's not defined by it is left to the discretion of the implementers, who should be doing their best to follow the "spirit of C", which ultimately means behaving sanely. "But the standard allows it" should never be a valid argument --- the standard allows a lot of other things, not all of which make sense.

A related rant by Linus Torvalds: https://bugzilla.redhat.com/show_bug.cgi?id=638477#c129


force the compiler writers to fix their idiotic assumptions instead of bending over backwards to please what's essentially a tiny minority

As far as I understand it, they do neither. Transforming an AST to any level of target code is not done by handcrafted recipes, but instead is feeded into efficient abstract solvers which have these assumptions as an operational detail. E.g.:

  p = &x;
  if (p != &x) foo(); // optimized out
is not much different from

  if (p == NULL) foo(); // optimized out
  printf("%c", *p);
No assumption here is idiotic, cause no single human was involved, it’s just a class of constraints, which alone to separate properly you’ll have to scratch your head extensively (imagine telling a logic system that p is both 0 and not-0 when 0-test is “explicit” and asking it to normally operate). Compiler writers do not format disks just to punish your UBs. Of course you can write a boring compiler that emits opcodes at face expr value, without most UBs being a problem. Plenty of these, why not just take one?


In your example, why should it optimise out the second case? Maybe foo() changed p so it's no longer null.

Compiler writers do not format disks just to punish your UBs.

IMHO if the compiler exploiting UB is leading to counterintuitive behaviour that's making it harder to use the language, the compiler is the one that needs fixing, regardless of whether the standard allows it. "But we wrote the compiler so it can't be fixed" just feels like a "but the AI did it, not me" excuse.


You would need to pass *p or declare it as volatile I assume, otherwise by what means would foo change p?


The address of p could have been taken somewhere earlier and stored in a global that foo accesses, or a similar path to that; and of course, p could itself be a global. Indeed, if the purpose of foo is to make p non-null and point to valid memory, then by optimising away that code you have broken a valid program.

If the compiler doesn't know if foo may modify p, then it can't remove the call. Even if it can prove that foo does not modify p, it still can't remove the call: foo may still have some other side-effects that matter (like not returning --- either longjmp()'ing elsewhere or perhaps printing an error message about p being null and exiting?), so it won't even get to the null dereference.

As a programmer, if I write code like that, I either intend for foo to be doing something to p to make it non-null, or if it doesn't for whatever reason, then it will actually dereference the null and whatever happens when that's attempted on the particular platform, happens. One of the fundamental principles of C is "trust the programmer". In other words, by trying to be "helpful" and second-guessing the intent of the code while making assumptions about UB, the compiler has completely broken the expectations of the programmer. This is why assumptions based on UB are stupid.

The standard allows this, but the whole intent of UB is not so compiler-writers can play language-lawyer and abuse programmers; things it leaves undefined are usually because existing and possible future implementations vary so widely that they didn't even try to consider or enumerate the possibilities (unlike with "implementation-defined").


But in fact compilers do regularly prove such things as, "this function call did not touch that local variable". Escape analysis is a term related to this.

I'm more of two minds about that other step, where the compiler goes like, "here in the printf call the p will be dereferenced, so it surely is non-null, so we silently optimize that other thing out where we consider the possibility of it being null".

Also @joshuamorton, couldn't the compiler at least print a warning that it removed code based on an assumption that was inferred by the compiler? I really don't know a lot about those abstract logic solver approaches, but it feels like it should be easy to do.


You don't need to worry about null check removal optimizations unless you do this:

    int main() {
      char *p;
      p = mmap(0, 65536, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
      // ...
      return __builtin_popcountl((uintptr_t)p);
    }
Or you do this:

    void ContinueOnError(int sig, siginfo_t *si, ucontext_t *ctx) {
      xed_decoded_inst_zero_set_mode(&xedd, XED_MACHINE_MODE_LONG_64);
      xed_instruction_length_decode(&xedd, (void *)ctx->uc_mcontext.rip, 15);
      ctx->uc_mcontext.rip += xedd.length;
    }

    int main() {
      signal(SIGSEGV, ContinueOnError);
      volatile long *x = NULL;
      printf("*NULL = %ld\n", *x);
    }


warning that it removed code based on an assumption that was inferred by the compiler

That would dump a ton of warnings from various macro/meta routines, which real-world C is usually peppered with. Not that it’s particularly hard to do (at the very least compilers know which lines are missing from debug info alone).


> No assumption here is idiotic

Yes, the assumption that p is non-null is idiotic. Also, the implicit assumption that foo will always return.

> no single human was involved

Humans implemented the compilers that use the spec adversarially and humans lobby the standards committee to not fix the bugs

> Of course you can write a boring compiler that emits opcodes at face expr value, without most UBs being a problem. Plenty of these, why not just take one

The majority of optimizations are harmless and useful, only a handful are idiotic and harmful. I want a compiler that has the good optimizations and not the bad ones.


For essentially every form of UB that compilers actually take advantage of, there's a real program optimization benefit. Are there any particular UB cases where you think the benefit isn't worth it, or it should be implementation-specific behavior instead of undefined behavior?


Most performance wins from UB come from removing code that someone wrote intentionally. If that code wasn't meant to be run, it shouldn't be written. If it was written, it should be run.

Now obviously there are lots of counter-examples for that. You can probably list ten in a minute. But it should be the guiding philosophy of compiler optimizations. If the programmer wrote some code, it shouldn't just be removed. If the program would be faster without that code, the programmer should be the one responsible for deciding whether the code gets removed or not.


MSVC and ICC have traditionally been far less keen on exploiting UB, yet are extremely competitive on performance (ICC in particular). That alone is enough evidence to convince me that UB is not the performance-panacea that the gcc/clang crowd think it is, and from my experience with writing Asm, good instruction selection and scheduling is far more important than trying to pull tricks with UB.


Get the teamsters and workers world party to occupy clang. You should fork C to restore the spirit of C and call it Spiritual C since we need a new successor to Holy C.


I read this, and go "yes, yes, yes", and then "NO!".

Shifts and ors really is the sanest and simplest way to express "assembling an integer from bytes". Masking is _a_ way to deal with the current C spec which has silly promotion rules. Unsigned everything is more fundamental than signed.


It does, macro assemblers, specially those with PC and Amiga roots.

Which given its heritage, that is what PDP-11 C used to be, after all BCPL origin was as minimal language required to bootstrap CPL, nothing else.

Actually, I think TI has a macro Assembler with a C like syntax, just cannot recall the name any longer.


> That doesn't mean the compiler can't do optimizations, but it shouldn't do things like prove code as UB and fold everything to a no-op.

UB doesn't just mean the compiler can treat it as a no-op. It means the compiler can do whatever it likes and still be compliant with the spec.

From the POV of someone consulting the spec, if something results in UB, what it means is: "Don't look here for documentation, look in the documentation of your compiler!".

Many compilers prefer to do a no-op because it is the cheapest thing to do.


My read of the standard is the worst the compiler can do, is to do nothing. For example, the blog post links a tweet where clang doing nothing meant generating an empty function so that calling it execution fell through to a different function the author had written which formats the hard drive. However it wouldn't be kosher for the compiler to generate the asm that formats your hard drive itself as an intended punishment for UB since the standard recommends, if you're not going to do nothing, then you can have the compiler act in a way that's characteristic of the environment, or you can crash with an error.


C is carful to distinguish “unspecified behavior” (every compiler must document a consistent choice) and “undefined behavior” which doesn’t necessarily have any safe uses.


you could instead simply use hton/ntoh and trust the library properly does The Right Thing tm




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: