Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And UB has always been an excuse for compilers to screw over programmers in hideous ways

Your reply was great up until this. Compiler writers aren’t looking to screw over programmers, they’re looking to make code faster. UB gives them the ability to make assumptions about what is and is not true, at a particular moment in time, in order to skip doing unnecessary work at runtime.

By assuming that code is always on the happy path, you can cut a lot of corners and skip checks that would otherwise greatly slow down the code. Furthermore, these benefits can cascade into more and more optimizations. Sometimes you can have these large, complicated functions and call graphs get optimized down to a handful of inlined instructions. Sometimes the speedup can be so dramatic that the entire application is unusable without it!

Many of these optimizations would be impossible if compilers were forced to assume the opposite: that UB will occur whenever possible.

The tool programmers have available to them is compiler flags. You can use flags to turn off these assumptions, at the cost of losing out on optimizations, if your code needs it and you’re unable to fix it. But it’s better to turn on all possible warnings and treat warnings as errors, rather than ignoring them, to push yourself to fix the code.



the thing that makes UB almost malicious is that it propagates inter-procedurally. This makes reasoning about code with UB basically impossible which means that you should always assume that the compiler is going to screw you over if you use it because there is no way to know whether it will.


You should consider a program with undefined behaviour to be the equivalent of a mathematical proof that contains an unstated contradiction. Ex falso quodlibet: from a falsehood anything follows. Also called the principle of explosion.

Undefined behaviour renders your entire program meaningless. It must be avoided at all costs. Using undefined behaviour on purpose is like sticking a fork in an electrical socket.


> Undefined behaviour renders your entire program meaningless

That's exactly the complaint. Consider that the implementations of the standard library sometimes have exposed UB: that renders behaviour of all of the running code on the system undefined.

Many programmers believe that the fallout of the UB could, and therefore should, be limited in scope.


To achieve your goal, compilers would have to disable any sufficiently powerful optimization. If you write bugs (UB), a powerful compiler will eventually catch them and generate code that you didn't intend at the beginning. However, this is not the fault of the compiler or the language.


Compiler writers have already done this. With flags you can disable any optimization you like, with all of the performance loss that entails. But then people complain that their programs are slow.

What people really want is an AI that ignores the code they write and just “does what they really meant.” But of course that’s not foolproof either. Every day people ask each other to do things and miscommunications occur, with the wrong thing being done. I don’t really know what to say other than “people should be more careful and also more forgiving.”


Exactly. All the hoopla about UB is complaining about how compiler optimizations work and the fact that the standard committee makes clear (with each new meeting) what is considered undefined behavior or not. They should instead thank the committee for clarifying this.


It is the fault of the language to the extent that the purpose of the language is to make it easy to write correct programs, and UB makes it really hard (and in some cases impossible) to write correct programs.


It is just the opposite. UB is a clarification to tell programmers what the language considers to be undesired behavior. If they didn't say anything, it would be always a mystery if a certain construct was allowed or not, effectively making it compiler dependent. Compilers would also have less avenue for creating optimizations. In the next iterations of the C standard we may see more constructs classified as UB.


That sounds good in theory, but many things that are UB in C/C++ are UB because they are really hard to verify at compile time which makes them almost impossible to program around. Any signed addition in C is potential UB unless you have a proof that all numbers that will ever be input to the addition won't cause overflow (which is made harder because C doesn't define the size of the default integer types). Furthermore, no progress is UB which means that as a programmer, you have to solve the halting problem for your program before knowing whether it has a bug.


> many things that are UB in C/C++ are UB because they are really hard to verify at compile time which makes them almost impossible to program around

The second half of the sentence doesn't follow from the first. Take everyone's favorite example, signed integer overflow: all you have to do to avoid UB on signed integer overflow is check for overflow before doing the operation (and C23 finally adds features to do that for you).

Taking a step back, the fundamental thing about UB is that it is very nearly always a bug in your code (and this includes especially integer overflow!). Even if you gave well-defined semantics to UB, the semantics you'd give would very rarely make the program not buggy. Complaining that we can't prove programs free of UB is tantamount to complaining that we can't prove programs free of bugs.

It actually turns out that UB is actually extremely helpful for tools that try to help programmers find bugs in their code. Since UB is automatically a bug, any tool that finds UB knows that it found a bug; if you give it well-defined semantics instead, it's a lot trickier to assert that it's a bug. In a real-world example, the infamous buffer overflow vulnerability Heartbleed stymied most (all?) static analyzers for the simple reason that, due to how OpenSSL did memory management, it wasn't actually undefined behavior by C's definition. Unsigned integer overflow also falls into this bucket--it's very hard to distinguish between intentional cases of unsigned integer overflow (e.g., hashing algorithms) from unintentional cases (e.g., calculating buffer sizes).


My complaint here is that it took C more than 30 years between defining signed integer overflow as UB and providing programmers with standard library facilities to check if a signed integer operation would result in overflow.

I much prefer Rust's approach to arithmetic, where overflow with plain arithmetic operators is defined as a bug, and panics on debug-enabled builds, plus special operations in the standard library like wrapping_add and saturating_add for the special cases where overflow is expected.


My complaint here is that it took C more than 30 years ... I much prefer Rust's approach

That's an odd complaint. Rust didn't spring forth fully formed from the ether, it stands on the shoulders of C (and other giants of PL history). 30 years ago you couldn't use Rust at all because it didn't exist.

The reason the committee doesn't just radically change C in all these nice ways to catch up to Rust is because it would be incompatible. Then you wouldn't have fixed C, you'd just have two languages: "old C", which all of the existing C code in the world is written in, and "new C", which nothing is written in. At that point why not just start over from scratch, like they did with Rust?


Interestingly, the first Ada standard in 1983 defined signed integer overflow to raise a CONSTRAINT_ERROR exception.

But apparently it lacked unsigned integers with modular arithmetic?

http://archive.adaic.com/standards/83lrm/html/lrm-11-01.html... http://archive.adaic.com/standards/83lrm/html/lrm-03-05.html

The 2012 version is a bit more readable, and has unsigned integers:

For a signed integer type, the exception Constraint_Error is raised by the execution of an operation that cannot deliver the correct result because it is outside the base range of the type. For any integer type, Constraint_Error is raised by the operators "/", "rem", and "mod" if the right operand is zero.

For a modular type, if the result of the execution of a predefined operator (see 4.5) is outside the base range of the type, the result is reduced modulo the modulus of the type to a value that is within the base range of the type.

http://www.ada-auth.org/standards/rm12_w_tc1/html/RM-3-5-4.h...


> all you have to do to avoid UB on signed integer overflow is check for overflow before doing the operation

All you have to do is add a check for overflow _that the compiler will not throw away because "UB won't happen"_. The very thing you want to avoid makes avoiding it very hard, and lots of bugs have resulted from compilers "optimizing" away such overflow checks.


This is covered in the article and numerous replies in this thread. Use <stdckdint.h>.


stdckdint.h is only available in C23. The problem has existed before that and lead to tons of exploits and bugs.


> all you have to do to avoid UB on signed integer overflow is check for overflow before doing the operation (and C23 finally adds features to do that for you).

…making your code practically unreadable, since you have to write ckd_add(ckd_add(ckd_mul(a,a),ckd_mul(ckd_mul(2,a),b)),ckd_mul(b,b)) instead of a * a + 2 * a * b + b * b.


That's not the correct syntax for the ckd_ operations. They take 3 operands, the first being a pointer to an integer where the result should be stored. And they return a bool, which you need to check in a conditional. If you're just going to throw out the bool and ignore the overflows, why bother with checked operations in the first place?


Yeah, I realize that now. That's even worse. So you'll have to write something like

    int aa,twoa,twoab,bb,aaplustwoab,aaplustwoabplusbb;
    if (ckd_mul(a,a,&aa)) { return error; }
    if (ckd_mul(2,a,&twoa)) { return error; }
    // …
    if (ckd_add(aaplustwoab,bb,aaplustwoabplusbb)) { return error; }
    return aaplustwoabplusbb;
So ergonomic!

> If you're just going to throw out the bool and ignore the overflows, why bother with checked operations in the first place?

I'd expect the functions to return the result on success and crash on failure. Or better, raise an exception, but C doesn't have exceptions…


Why not just write:

    bool aplusb_sqr(int* c, int a, int b) {
        return c && ckd_add(c, a, b) && ckd_mul(c, *c, *c);
    }


Obviously you could do that in this case, I just wanted to come up with a complicated formula.


See my other comment [1] which addresses the exact things you brought up here. Safe checked arithmetic is a new standard feature in C23. If no progress were not UB, then tons of loop optimizations would be impossible and then we couldn’t have nice things, like numpy.

[1] https://news.ycombinator.com/item?id=35406554


> Any signed addition in C is potential UB unless you have a proof that all numbers that will ever be input to the addition won't cause overflow

This has always been the case. Standard C has always operated with the possibility that addition can overflow. The programmer or library writer is responsible to check if the used types are large enough. If you want to be perfectly sure you need to check for overflow. Making this UB has not changed the nature of the issue.

> is made harder because C doesn't define the size of the default integer types

They correctly made this implementation defined. But C now has different byte sized integer types if you want to be sure.


Is the improved performance of C over say Java, or Rust (which both have much less undefined behaviour -- Java almost none) worth the pain and bugs which have been caused by UB?

Honestly, I don't think so, and as computers get more powerful and the amount of the world which relies on their correct functioning grows, I feel the arguments for UB become increasingly difficult to justify.


I went to look up undefined behaviour in Rust and I got this scary warning:

Warning: The following list is not exhaustive. There is no formal model of Rust's semantics for what is and is not allowed in unsafe code, so there may be more behavior considered unsafe. The following list is just what we know for sure is undefined behavior. Please read the Rustonomicon before writing unsafe code.

After the warning was a list of many of the same types of things that are undefined behaviour in C. In addition, there’s a bunch more undefined behaviour related to improper usage of the unsafe keyword.

So I don’t think you get a free lunch with Rust here. What you get is a “safe” playground if you stay within the guard rails and avoid using the unsafe keyword. But then you are limited to writing programs which can be expressed in safe Rust, a proper subset of all programs you might want to write.

Furthermore, the lack of a formal specification for Rust is one area where it lags behind C, a standardized language. All of the undefined behaviour in C is decreed and documented by the standard, having been decided by the committee. Rust, on the other hand, may have weird and unpredictable behaviour that you just have to debug yourself, which may or may not be compiler bugs.


I agree rust isn’t perfect, but I think you underestimate the value of “safe” code.

I often write programs that have unsafe code. However, the unsafe code is never more than 100 lines, which means I have a very small amount of code to reason about — Rust users expect (of course, you as a programmer has to enforce) that it should be possible to cause UB from safe code, so my “safe interface” to my unsafe code ensures my code can’t cause UB, no matter what I call.

On problem with Rust is generally when you mess up it panics — I think that’s better than buffer overflows and the like, but still not a good user experience.

This means there is a very small amount of code I have to really think about, while in C or C++, basically any place x[i] appears (regardless of if x is a pointer or a std::vector).

You can of course write safe C code, people do, but it’s hard, and it only takes one slip up anywhere in your program to blow it.


In one sense, C is the unsafe code block for myriad other languages, like Python. Python users don’t want to deal with undefined behaviour either. They want to write their high level code in NumPy or PyTorch and just have everything work very fast.

Little do they know: they rely on C for those libraries and for things like ATLAS and LAPACK, which implement the underlying numerical linear algebra code. Well, it turns out that ATLAS relies pretty heavily on optimizing C compilers to generate optimal code on many different platforms. At the bottom of all this are the many loop optimizations included in compilers which, thanks to undefined behaviour in the C spec, are able to assume that code is always on the happy path.

It also turns out that Rust includes bindings to ATLAS and LAPACK. I would imagine at some point people might want to write a new linear algebra package in pure Rust. I think it’ll be quite difficult to match the performance of those two in safe Rust, but we’ll see.


Isn't LAPACK written in Fortran?


You're right, and ATLAS is as well, but Fortran has undefined behaviour [1] for all the same reasons that C does.

[1] https://stackoverflow.com/a/57558908


C does not have a formal specification either. It has a standard's document that is written using formal English, but it does not provide a formal spec of C's semantics. A formal spec of a programming language's semantics would entail using a formal semantic model such as operational or denotational semantics. Some programming languages do specify the formal semantics for the entire language or some subset of the language but C is not one of them.

Your claim that the C Standard lists all undefined behavior is actually false. The C Standard only lists out the explicit list of undefined behavior, but it does not list out the implicit list of undefined behavior. There have been efforts to make just such a list but it's an incredibly difficult task.


Java can optimize programs to make well-defined but very rarely occurring cases be on the slow path. C can't really do this.


It's funny that your original post was an objection to how undefined behavior gives license to screw developers over, but here you are talking about how undefined behavior is like sticking a fork in an electrical socket.


My original post was an objection to the implied intent on the part of compiler writers. An electrical socket does not have intent, it's just a hazard that also happens to provide enormous benefits to our lifestyles.

I think it's a perfect analogy to undefined behaviour in C: enormous benefits but also a hazard to be wary of. A lot of people don't understand the benefits, they just see the hazard. Throughout this discussion I've been trying to clarify that, with perhaps limited success.


But just to be clear @chongli is logical

Think of UB as a probabilistic error. I.e. it is always stupid to rely on it

1. Write code without errors -- sensible 2. Allow compilers to assume the absence of errors -- occasionally sensible, since it speeds up your program

In defence of UB, for the most part they are things that should break your program anyway: stack overflow is never correct. So your choice is mostly to fail badly quickly, or to fail slowly well

Thanks to google making the UB sanitizers you are free to make that choice even in C


I'd argue that it's stupid to think that it's stupid to rely on UB.

Almost any non-trivial software explicitly relies on undefined behavior, including safety critical libraries such as cryptographic libraries, the Linux operating system has rampant undefined behavior that it makes a conscious decision to use. POSIX makes use of undefined behavior for shared libraries (it treats functions loaded from shared libraries as void*, which is undefined behavior).


That's not an argument to keep live grenades laying around, it's an argument to remove them from the spec.

Like signed int being UB. Define it to have 2 complement semantics. Problem solved. I'm sure the nutters trying to extend C++ with templates will howl but this is C not C++. And seriously C++ is dead man walking at this point.


C23 does make two’s complement standard. It also adds checked arithmetic so you can safely avoid signed overflow.

It does not make signed overflow defined behaviour. This would prevent integer operation reordering as an optimization, leading to slower code.


>This would prevent integer operation reordering as an optimization, leading to slower code.

The sane way to address that is to add explicit opt-in annotations like 'restrict'.

  #push_optimize(assume_no_integer_overflow)
  int x = a + b;
  // more performance orientated code
  #pop_optimize
  // back to sane C

  #push_optimize(assume_no_alias(a, b), assume_stride(a, 16), assume_stride(b, 16))
  void compute(float *a, float *b, int index)
  {
   // here the compiler can assume a and b do not alias
   // and it can assume it can always load 16 bytes at a time
   // the programmer has made sure it's aligned and padded to so with any index
   // there's always 16 bytes to load
   // so go on, use any vectorized simd instruction you want
  }
  #pop_optimize
  // back to sane C


That’s a lot uglier and clunkier than just using the ckd_add, ckd_mul etc. safe checked arithmetic. Plus if an overflow occurs you still get an incorrect result which you probably don’t want.

Or maybe I’m wrong? Do people actually want overflows to occur and incorrect results? If they’re willing to tolerate incorrect results, why would they also want optimizations disabled?


The thing is it's ugly in the rare case that absolute performance is worth fighting for. And not ugly in the majority case where it isn't in the top three important things.


No, GP's proposal is ugly in the majority case. If you're going to make signed overflow defined behaviour then every time you write:

    int c = a + b;
You have to assume it will overflow and give an incorrect result. So now you need to check everything, everywhere, and you don't get any optimizations unless you explicitly ask for them with those ugly #push_optimize annotations. I completely fail to see how this is an advantage.

The way C works right now, the assumption is that you want optimization by default and safety is opt-in. The GP's proposal takes away the optimization by default. It then makes incorrect results the default, but it does not make safety the default. To make safety the default you would have to force people to write conditionals all over the place to check for the overflows with ckd_add, ckd_mul etc. Merely writing:

    int c = a + b;
Does not give you any assurances that your answer will be correct.


"So now you need to check everything, everywhere"

If you want to write robust code in C that what you need to do. UB doesn't give you runtime checks nor compile time checks for overflow.

"Does not give you any assurances that your answer will be correct."

Your problem is you think C's int is a mathematical integer when it is not. It's an ordered set.


You misunderstood what I was saying. Those statements are from the perspective of the hypothetical language proposal I was critiquing. That proposal turns off all the optimizations by default and forces you to add annotations to turn them back on. At the same time, it does not actually give you anything useful for your trouble because it still doesn't solve the problem of signed overflow giving incorrect results.

The way C is now, you get the performance by default and safety is opt-in. That's the tradeoff C makes and it's a good one. Other languages give safety by default and make performance opt-in. The proposal I was responding to gives neither.


Yeah but it's reversed signed overflow shouldn't be UB by default. You should have to explicitly opt in for that.

The reason of course why they refuse to do that if because if that were that case most shops would up and ban unsafe signed.


C++ 20 did that too.


Until LLVM, GCC, key game engines and GPGPU SDK get rewritten into something else, it is going to be Resident Evil day for a looong time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: