If you write a quiz like this, and "Don't Know" is an answer but "Undefined Behavior" and "Implementation-Dependent" are left off, you're doing it wrong.
The question is "what would the return value be," and without knowing which compiler is used you don't know what it would return. It doesn't make sense to say the program returns "undefined behavior" or "implementation-dependent."
You might be taking issue with the question itself, but I think it's a valid question because C compilers tend to be lenient with what they compile.
Thing is I can know if the necessary information is provided (i.e. CPU architecture, version of the standard). Portability is a feature of C, but C's portability is mostly syntactic and only partly semantic.
I don't really want to be a pedant here, but in this case if the necessary information was provided, I had the knowledge to find a correct answer. This is different from the case where even if the information was provided I wouldn't be able to know because I don't have the necessary knowledge.
Indeed. The quiz doesn't start with "What does the ANSI C standard guarantee about..." It says "So you think you know C." C is a cross-platform assembly language. I use it when I need to code to the metal. I know exactly how it will behave in most of those scenarios.
I understand, in abstract, I might be on an EBCDIC system, and things will be different. But that's not the reality of C. C isn't just a formal specification.
I spent a while on 1, thinking about how my compiler would handle that. 2 I was 90% confident about. 3 and 4 I was 100% confident about.
What you are talking about is C on your particular compiler on your particular platform. If you are writing C code just to be used for that scenario then you can use all of the implementation understanding you want.
If, however, you want to write C that will have consistent behaviour across multiple compilers and multiple platforms, then you need to limit yourself to the behaviours that the standard guarantees. Otherwise the compiler behaviour may change and your program behaviour will change.
Even between different versions of the same compiler implementation defined behaviour can change (although assumptions about sizes probably won't).
A lot of the implementation defined behavior can be ensured with static assertions (either via new-fangled C11 _Static_assert or via the old fashioned negative length array hack), particularly WRT sizes.
Absolutely, and that's a big step forward to writing code that will silently do bad things on a different platform. I use that technique on my current project to assert a load of things that I know are true.
It doesn't get you as far as the ideal goal which is to write code that will silently work on another platform. Static asserts mean that you then have to go and write more code when they happen.
I know exactly how it will behave in most of those scenarios.
Right up until an optimizing compiler throws your assumptions under the bus.
Compilers are allowed to (and both GCC and Clang frequently do) assume that your program will never invoke undefined behavior, and optimize accordingly.
People would figure it out from the start if "Implementation-Dependent" was an option. Perhaps a "Something else" would be more appropriate so that the reader would still be surprised at the end and feel it was a decent multiple-choice test.
It is likely that the state of someone's knowledge is "don't know" regardless of the actual state of affairs.
If a person wanting to improve guesses correctly, did they actually learn anything? Can they use that next time? They definitely can if they get it wrong and have a wrong answer to start research with.
Edit - I see that I misread your comment, please disregard this. I thought the quiz had don't know and the other options you specified.
This is pretty silly. Being fluent in C has not much at all to do with language quirks. Since (I assume) the 1980s, the comp.lang.c crowd has been reminding us that compilers are fundamentally unknowable and that sizeof(char) could very well be a random walk averaging 1.35. Meanwhile, in the real world, network and kernel programmers have been using some variant of u_int32_t since forever to ensure that they can parse network packets and driver data structures by copying or even casting things into structs.
Also, why would anyone write ' ' * 13? For most of these questions, if your first instinct is "that doesn't look like good C code", your answer is better than the "right" answer.
> Meanwhile, in the real world, network and kernel programmers have been using some variant of u_int32_t since forever to ensure that they can parse network packets and driver data structures by copying or even casting things into structs.
And compiler developers in the real world are also increasingly nervous at this state of affairs. The subset of C that works in practice is unspecified. There are plenty of optimizations that compiler developers would like to do, but they don't because they're worried about breaking some code somewhere that relies on undefined behavior (like the code you're talking about). At the same time, performance competition pushes compilers to exploit undefined behavior more and more heavily. There is (IMHO, well-founded) worry across the board that this situation is not sustainable. Experience with this problem motivated the design of languages like Swift, which try to clamp down on UB drastically—which also has the happy side effect of making articles like this one inapplicable.
Isn't that code wrong anyway? If the bundle size were 16 bytes, the shift would be well defined but still give the wrong result, 64K alignment. Not a great example of undefined behavior causing a bug.
Actually, since C99, sizeof char is defined to be 1 (byte). The number of bits in a byte is implementation-defined.
However, "it's implementation specific and it's most probably this, or that if you use such and such a compiler flag" is different from "I don't know".
As a rule of thumb, don't assume things about C's abstractions, read the standard instead, or ask friendly humans who do if they check out.
Yes, but the term "byte" doesn't tell you much. Most people assume that a byte is 8-bits which isn't true. If the definition the of `char` was `1 octet`, then you could make safer assumptions about "What's the result of `' ' * 13`?".
From ISO/IEC 9899:2011:
3.6
byte
addressable unit of data storage large enough to hold any
member of the basic character set of the execution
environment
NOTE 1 It is possible to express the address of each
individual byte of an object uniquely.
NOTE 2 A byte is composed of a contiguous sequence of bits,
the number of which is implementation-defined. The least
significant bit is called the low-order bit; the most
significant bit is called the high-order bit.
Edit: Reading further into the C11 spec, `5.2.4.2.1 Sizes of integer types <limits.h>`, it says that `CHAR_BIT` must be 8 bits (or larger). A google search suggests there exists some processors that have 16-bit bytes and others that have 32-bit bytes.
It’s not because of the signedness of char—not only do character literals have type int, but even if this weren’t a literal, the char would be promoted to (signed) int before the multiplication. But of course the multiplication of two ints can still overflow.
> Most people assume that a byte is 8-bits which isn't true.
It is worth remembering that POSIX (and Windows probably too) mandate 8bit chars so there is no point being defensive about it on these particular platforms. And I kid you not, I have seen people who are. Because "ISO C99 this and that".
The problem with this idea is that, if you're relying on undefined behavior and don't know it, your compiler's behavior might change depending (at a minimum) on the code you use in production but not in your test case and depending on the compiler version.
It's much better to avoid undefined behavior entirely. The problem with this idea is that it's hard to learn and remember all the different kinds of undefined behavior.
>> Casey Muratori, who's a better C programmer than me, is of the opinion that you should check your compiler's behavior rather than read standards.
In Casey Muratori's case, he's the type of person who cares to understand what the compiler is actually generating under the hood. He doesn't exactly trust the compiler to always do the right thing.
Casey Muratori also has been developing in Visual Studio for a very long time. Visual Studio has a long, tortuous, history of being non-stanards compliant and buggy. And Microsoft until very recently, never cared to fix anything. (And it still doesn't conform to C99, let alone C11.)
Microsoft already stated multiple times that the focus of Visual C++ is C++ (hence the name).
Whatever C related updates you might see, are the ones required by the ANSI C++ for C compatibility.
For C99 and C11 support there is the clang frontend to C2, the Visual C++ backend, currently taking an LLVM role at Microsoft, being shared between clang, VC++ and .NET Native.
The Visual C compiler is only for backwards compatibility. If you are looking for improved C compilation you'll be waiting a very long time. The only improvements they've made were those also required for C++.
So, how exactly do you check your compiler's behavior?
Given: 1) There are various optimizations that may kick in or not depending on UB. 2) The cleverness of the optimizer may and does(!) change between versions. 3) There's no "-warn-on-optimizer-changes" flag for any compiler.
Do you think it's realistic to compare compiler output whenever you upgrade your compiler? Do you think it's realistic for 1M SLoC projects? If not, do you take a statistical approach with random sampling? If so, I'd be very interested. If not... what exactly are you sure of and how?
> is of the opinion that you should check your compiler's behavior rather than read standards
Do both.
Disassemble your compiler's output to see what horrors the optimizer inflicted on your code to "break" it. Read the standard to understand why it thought it was an okay optimization to do in your case, to avoid running afoul of the same mistake again. Ask what other optimizations are enabled by this undefined behavior, the better to spot them when the code your coworker's former boss's roomate, who interned for a month or two, invokes undefined behavior.
Repeat.
(To be fair, I haven't read the standards cover to cover - perhaps I should, I'm still occasionally learning about new and exciting forms of UB...)
There is one thing I have noticed in gcc and clang which is done against the standards: The sign of bit sized int in a struct.
eg: In a struct with member 'int a:8', The standard says that 'a' can be signed or unsigned (based on the machines default of the sign of char). But In gcc and clang, this is signed always regardless of the machine.
Similarly, msvc guarantees signed overflow, it's not UB when targeting only that compiler. And gcc/clang have a compiler switch to do the same, I believe. Those things are probably what they meant when they said to look at what the compiler does.
> eg: In a struct with member 'int a:8', The standard says that 'a' can be signed or unsigned (based on the machines default of the sign of char). But In gcc and clang, this is signed always regardless of the machine.
Wait, are you saying gcc/clang's behaviour is permitted by the standard, or not?
> Wait, are you saying gcc/clang's behaviour is permitted by the standard, or not?
It's permitted. The same way as a char can be signed or unsigned: Implementation defined. But in this case, it's always signed regardless of the architecture (against how a char is handled).
From c99 draft (n1570), J.3.9 (Implementation-defined behavior):
Whether a ‘‘plain’’ int bit-field is treated as a signed int bit-field or as an
unsigned int bit-field (6.7.2, 6.7.2.1).
But say, another compiler (ARM compiler) treats this differently [0]: Untill version 5 of the compiler, such an int was unsigned by default. Later versions defaulted to signed (as do gcc/clang).
For the record, I disagree with this. You only need to check the implementation if you're straying from the standard or the compiler is. Neither should be... standard.
Exactly. I immediately thought "implementation specific" on the first question but then didn't see that as an option. By the second question, I had caught on that 'D' was the intended answer throughout.
sizeof(char) has to be 1, but char doesn't have to be 1 byte in size. It's more correct to say that sizeof(foo) returns a result relative to sizeof(char).
The difference may be 100% semantics, but when I am using a bit field, I am addressing the particular bits I am interested in via a name, and when I am shifting and masking bits, I am directly manipulating a piece of memory until its value represents the bits that I am interested in.
I prefer to think of "byte" as unsigned. Although I've never quite figured out why char is signed on most implementations, it would most naturally be unsigned too.
I suspect because signed has more undefined behaviour. C compilers will always choose the option that gives them most optimization freedom since that's what they're judged on.
Add 128 to a signed char and the compiler is free to assume it is zero/false (because undefined behavior) OR assume the value is always greater than 127 (because undefined behavior). Or if it compiles it into machine code, the result may depend on register width since it may or may not store it back into memory. Resulting in a value either larger than 127 or mod 128 depending on register pressure since the compiler isn't obligated to AND 0xFF because Undefined Behavior.
>Actually, since C99, sizeof char is defined to be 1 (byte). The number of bits in a byte is implementation-defined.
(edit: i misread, it's always 1) To be pedantic, a size of char is defined by CHAR_BIT in limits.h. As well as all of the min and max values and lengths in bits for integers.
To add to the discussion, many high level languages don't have a hard min and max values (haskell, that is popular in these woods, i see has minBound/maxBound).
From the "So you think you know C?" test i answered... probably correct for the "standard" x86 implementation, while knowing that they are implementation specific (for 4 i'm not sure). The authors "Don't know" is his own, i would have answered the questions as "It depends", but that was not offered.
In the end i agree with the general notion here, that this is ugly C.
PS Integer wrapping is undefined as well, so "good" C code should work around it. (i miss this from assembly)
> To be pedantic, a size of char is defined by CHAR_BIT in limits.h. As well as all of the min and max values and lengths in bits for integers.
To be even more pedantic: it's the number of bits in a char that is defined by CHAR_BIT. However, sizeof returns the number of chars needed to store a type, where sizeof(char) is defined to be 1.
> Actually, since C99, sizeof char is defined to be 1 (byte). The number of bits in a byte is implementation-defined.
Ever since, reading introductions to C that say things like "a char is usually one byte in size" make me cringe. I never bothered to check, though, what C89/C90 had to say on this.
I guess you don't spend too much time looking at the typical C code written on most average enterprise shops.
Back when I used to code C professionally on early 2000's, these type of issues were quite common.
Specially since regardless of what one reads on HN, or hears at most conferences, most of those companies, which tend to see IT as a cost center, usually don't employ any processes regarding code reviews, static analysis or unit tests.
Which in languages like C this means an heavy price when things blow up, or the way things are going with IoT.
So given the increase in CVE entries, this still matters a lot.
I think that this might be true in the early 2000's, I hope that we have moved on significantly by now. I know that the automotive folks are getting a lot better at code review, static analysis and unit testing.
Yeah, and keep in mind that the developers who even attend CppCon 2015 are probably in the top 10% (at least) in terms of interest in these types of things.
It might be that they're in the top 10% of interest in ramping up in these types of things. I'm sure I'm not alone in having little interest in attending a technical conference that mostly covers things I already know - I'm there either to learn, in which case I want new content - not rehashed content - or I'm there for other reasons. Low rates of already using these tools, then, may make a certain amount of sense. If they were already putting them to practice, they may be less likely to attend.
Or I could be way off base - I've never attended CppCon. And I do love me some ASAN when I need it ;)
Indeed. If you contrast it against other engineering disciplines, even a closely related one like hardware engineering or FPGA engineering, the mindset is totally different. The amount of checks, reviews and many other practices is astounding.
... which is even more surprising given that many of these practices (in software) are almost-free. Granted, there aren't compile-time guarantees for most of these things, but with ASAN/UBSAN/etc. you'll certainly find the top 90% of your problems pretty fast.
In my experience diligent programming and going to conferences (or giving talks at conferences) are inversely correlated. Many conference pundits do little to no actual programming in the first place.
I'm not sure exactly what you're driving at, but for conferences like CppCon (and similar), the driving factor is usually technical interest and not just a way to waste time away from work (while drinking every night or whatever[1]).
So... could you explain what you're insinuating so I could respond to that instead of just guessing?
[1] Not sure about CppCon, but all "technical" conferences I've personally been to did have a lot of late-night drinking, but that was secondary to the technical stuff.
I'm intimately familiar with an in-production, business-critical code base and there's not a single best-practice it doesn't breach. If the compiler, on it's most lenient setting, allows it, then it has been done. It's fortunately not in C, but it tries to compensate by being pretty big.
I'm non-native English speaker/writer too. The qualifiers and disclaimers that I'm objecting to are not grammatically wrong. The point I'm making is that your statement loses all force due to the style.
The problem with all these language lawyets is thay they present themselves as the keepers of the Truth and if you can't cite the standard for every line of code you write, you "don't really know the language." While in reality, language lawyering is an obscure hobby, almost orthogonal to real world programming.
Oh, but I'm sure that someone somewhere once ran into each of these things - maybe that guy who once ported some SMS messaging platform from the IBM mainframe to Solaris and had to use some specific commercial compiler, or the other one who wrote some software that had to work on every version of Linux from the 1.0 kernels up and had to link to specific binary libraries so it had to work on versions of gcc spanning 15 years.
What I'm saying is that knowing these minute details doesn't make one a proficient developer with the language, nor does getting every one of this list wrong a bad developer. Sure, with 10 years of experience most people would run into one or two on the list and know about it; still doesn't make the premise that not knowing or caring about trivia competitions makes one not 'know' the language.
The amount of exploits caused by such issues, is not as irrelevant as you hint at, after all if Apple developers get it wrong [1], what about all those that work in boring enterprises 9-5 and don't care one second about mundane stuff like static analyzers, warnings as errors, unit tests, code reviews.
The moment one needs to work with more than one compiler, including different versions of the same compiler, one needs to become a language lawyer to keep some sanity.
[1] I know the source of error wasn't related to these exact questions.
On second thought, who cares about reading its for squares and nerds. I would rather spend 2 hours Google searching what is an octet and finding out how enums are represented... :p
Yes, and corollary: you'll find people that have memorized the C standard and can answer the more subtle questions which are actually unable to use the language to create a decently sized program.
Is the slanted apostrophe symbol ( ‘ ) that he uses in #3 even treated the same as the normal single quote ( ' ) in C? Note that it's not even a backtick ( ` ), it's a third symbol that's not on a keyboard.
That may be a typo. Macs have a feature that automatically converts ' and " into ‘ and “ called "smart quotes." It's innocuous most of the time and maybe even welcome, but can be a real pain when trying to share a command with someone (who is probably going to copy and paste it, receive an error and not know why).
LaTeX (and probably TeX alone, but I never use it without LaTeX) supports something similar, but AFAICT only in appropriate contexts (matching pairs of `` and '' are converted to the appropriate quote, as are ` and ', but they don't seem to affect "typewriter text", so the smart quotes stay out of my code snippets). Or maybe I just haven't noticed LaTeX munging all over my document.
Fun fact: PowerShell recognizes smart quotes just like normal quotes, so you can delimit a string with “, ”, or " if you want. Same with en-dash and hyphen-minus.
One might argue, though, whether it's really a good thing to be resilient against crappy WordPress blogs, since it encourages running copy-pasted code from the web. But I guess that would happen anyway, regardless of language designer intentions and a little frustration with copy-pasting won't cause people to actually read the code they're running.
To be fair, as a Mac user I've only ever encountered that when using some rich text editors, but primarily word processors where your literal text isn't what you want (WYSIWYG editors). When dealing with plain text, even in browser comment blocks, it's always left my ' and " characters alone.
Right, and, again, it's likely what you want 99% of the time (unless you're copy/pasting more code examples than prose). I'd be surprised if this isn't typical behavior of many blogging platforms.
I did this test as a break from sitting in front of Keil C51. These questions highlight a very important issue: there are platforms and compilers out there that will bite you in the ass if you try to unconsciously apply what you learned while using gcc.
Under Keil C51, alignment is always 1. This is legal C.
Under Keil C51, int is 16-bit. This is legal C.
Under Keil C51, sizeof(pointer) is anything from 1 to 3. This is an extension, but a very popular one.
Don't assume that just because you can write crash-free code on x86, you "know" C.
> Under Keil C51, sizeof(pointer) is anything from 1 to 3. This is an extension, but a very popular one
I think this is the funniest thing in this all. You might cross all your t's and dot your i's, and you still may trip in a pitfall when your compiler/platform vendor happens to actually diverge from the standard. Such is life with decades old language with at least as many implementations.
Such is the price of using C on Harvard machines. It's even more pronounced on '51 as even the smallest $.5 chip has four (!) different address spaces (direct/register ram, indirect ram, xram and code) - and bigger chips have even more as they support memory banking. The real question is: why doesn't C support Harvard architecture natively, without ugly hacks?
For x86 in 16-bit realmode (i.e. when you're targeting PCs on MS-DOS), that is also true. Likewise, pointers can be 2 or 4 bytes.
On the other hand, if you are working with such an "exotic" (i.e. not ILP32/LLP64/LP64) platform, you probably already realise that a lot of other things are also very different and the fact that the syntax looks much the same and it's still called "C" is the least of your worries.
Microchip's PIC microcontrollers also come to mind as being an architecture with a C compiler, yet with very different characteristics from the "usual" x86/ARM. (AVRs, Z80s, and the like are more similar to 16-bit x86 restricted to a single segment, a "SIP16" data model.)
With gcc on an AVR an int is 16 bits and alignment is 1 as well.
That said I ported a bunch of stuff from the AVR to a 32 bit ARM Cotrex and the mostly non hardware dependent stuff stuff just worked. On the other hand stdint.h is an old friend to me.
And yet they are far better than the Modula II 8051 compiler we used to use. At least Keil C51 does static analysis on function locals to overlay them and create a "stack."
I'm reminded of an old story about a man trying to fill a job for driving a stage coach. He asks each applicant how close they can drive to the edge of a cliff without going over. The first few applicants brag about being able to right up to the edge. The job goes to the first to respond "I don't know, I try to stay as far away as possible".
I sort of understand the point of this post, but I basically disagree with it. From a pedantic perspective, you aren't writing "good C" if your code isn't 100% standards-compliant, but if we applied this same rigor to dynamic languages, no one would get anything done in them. How do you know your Ruby code works? The community has settled on the following solution: because it passes thousands of tests on every aspect of every function. Extending that to C code, your code works if it passes your tests.
Test on every platform and every compiler you support, or it's not going to work. Avoid undefined behavior, sure, that's probably good practice. But there is no need to bend over backwards for compilers from the 90s targeting the AS/400 if you are writing a linux application that only needs to support Ubuntu 16.04. If your only means of avoiding broken behavior is "we're going to try really hard not to do anything undefined or implementation-defined" you will fail. But if you pass your functional tests, your user doesn't care whether you are invoking undefined behavior with every function call.
Well I would agree that undefined behaviors are unavoidable. Indeed, if you insert checks into your code to completely rule out UB, that will have a huge impact on performance.
On the other hand, I don't agree passing tests is enough. No matter how many tests you run, you can't guarantee there's no bug in your code. This is important especially when security is a critical requirement.
Well said. I'd also add that most of these questions seem aimed at people who've written C compilers before, at least to me. My guess is that even an expert at C like antirez wouldn't really care about such nitpicky issues while writing Redis. Porting Redis to some arcane architecture is a different story entirely however.
"Nobody in their right mind should use code like this in production".
You should avoid code like this at all cost. If you want to use this code because you feel smarter, you will be creating enormous amount of problems in the future and you are actually dumb.
The way to win the game is not playing it. Specially if you are programming nuclear plants.
If you see this code what you have to do is replace it, not understand it. It works today but when you change the compiler or the architecture 10 years from now, it will be such a huge pain in the ass to find all the bugs and undefined behavior it creates.
And the key to not playing this undefined behaviour game is using integers of known size. C has them since C99, so using int, short, char etc. like in the article is reckless and inexcusable.
A note: 'int main()' and 'int main(void)' are different in C (but the same in C++).
Most online puzzles uses the former (I have also seen this in hacker rank), but the standard says that the main function should be defined as void if no argument is accepted (or some implementation defined manner)
Yep, we write C++ at work and the technical director loves writing (void) in parameters due to years of habit after writing C. To me it looks redundant (I went straight to C++, skipped C).
I'm working right now on a system that mixes Aarch32 and Aarch64 cores. Communication between these cores has to be rock-solid.
Previously, I worked on a DSP architecture that could switch at runtime between a 16-bit more and a 24-bit mode. I still sometimes have nightmares. A particular bug that took months to resolve involved app code that expected one mode and interrupts that went into another. The compiler did the right thing most of the time. Most of the time.
I don't understand this critique of an outdated version of the language. Let's see questions written in C99 using stdint types. Dealing with integer widths is a non-issue.
Yes, inc/dec operators take extra care. Structure packing requires platform awareness, similar to how endianness can matter for data manipulation.
Unfortunately licenses for outdated versions of outdated compilers for outdated versions of an ancient language still sell for 3995€ a seat - I know because I'm using one right now. Nothing beats good old 8051 in terms of price. :)
Yup, I deal with the unsigned char issue at work all the time, I've got a pile of code with casts to unsigned all over the place due to decoding some legacy data formats. I'm not sure what the cleanest way to fix it is, but I find myself writing nasty stuff like:
(((short)((unsigned char)a)) << 8)
Which works but goddamn is it ever hard to read. Bracket highlighting makes it just bearable. Fortunately I've abstracted this kind of stuff out as much as possible so I only have it in a few places, but it still bites me in the ass sometimes. Curious if anyone has a cleaner way of expressing this?
Definitely use the new stdint.h types[1]. That will make your intent a lot clearer. Also consider creating some macros for all common operations (like creating a uint32_t out of four uint8_t).
How is integer zero better in that scenario, though? The (void-star)0 pointer can still be cast to the correct type (and will be for everything except variadic functions like printf).
We teach people to write proper code, then they get interviewed on the most dumb half-assed code that nobody writes in the real world because we like mental gymnastics
IMO, this is actually a fairly good test of thorough C knowledge. All of these issues are regularly described in articles, blogs, forums, HN posts, etc. Anyone who has programmed C for a few years and put in effort to master the language should know this stuff.
I was about to post something similar, CTLR+F and found this post. :)
"Deep C" illustrates the difference well between people who know C and people who know C. A bit of it is language lawyering, but some of it can be practical - although perhaps specific to certain projects or environments.
IMO this sort of knowledge is most practical if you're maintaining legacy software or in specific environments that you would otherwise be making a mistake in. People writing evergreen software or in specific targeted environments don't really need to worry about all these types of "Gotch'ya!".
The lack of specified type sizes is something that bit me recently. On pretty much every x86-64 C/C++ compiler, sizeof(long) is 8, except on MSVC where it's 4. Thank goodness for explicitly sized, now-standard types like int64_t.
I wish they were in universal use. I'm constantly cleaning up a colleagues code where he's made many (now incorrect) assumptions about what "int", "short", "long" would mean. Even pointing him to the standard didn't help, I just got back his stock answer, "It works on X so why do we need to worry?" To which I reply, "Because we've changed architectures and compilers since then and now your code is frequently broken." You'd think he'd learn after a while, but some people are too stubborn.
I started when sizeof(int) = 2, sizeof(long) = 4. I've seen int go from smaller than a long to equal to a long (both 4 bytes), and then back to not equal (sizeof(long) = 8).
And if that colleague won't learn, that sounds like an issue in need of a bit of active management...
Unfortunately he's got the ear of the management. He's a good guy, and actually a good programmer. He's one of those folks that can churn out 1000 lines of (mostly) functioning code in a day. It's not well-written, but it does the job and that's what management wants to see (bad management, but hard to fight). But he's lousy about not learning new things and being stuck in his ways.
Once an idea enters his head it's hard to dislodge it. He learned programming from a mediocre CS school (despite the expensive price tag, which they deserve for their medical and law schools, not their engineering disciplines). And when something works, it's hard to convince him that it's wrong (that is, correct by coincidence) or unstable (in this case, sizeof(int) and related issues). He isn't technically (one of his favorite words) wrong in many instances, but problems will arise when we change OS, compiler, or CPU.
I'm slowly converting him to seeing things my way. My main contribution has been to mentor the junior developers. Things will change as their contributions worm their way into our projects.
I surely don't remember using them while creating code across MS-DOS, Xenix, DG-UX, Windows 3.1, just constraining myself to the compilation targets I used in 1994.
I didn't do cross-compiled WinAPI / DG-UX in '94, but as luck would have it I did do DG-UX (the first platform I wrote security code on), and I definitely had bit-width types to use.
§6.2.5p8 guarantees that integers of increasing conversion ranks obeys subset relations, and §6.3.1.1 lays out that short, int, and long have conversion ranks in that order.
Sharing data structures in RAM between 64 bit and 32 bit and 16 bit architectures brings this home painfully. Data communications is equivalently plainful, possibly moreso becaude one has no knowledge of the receiver's architecture. Sending multi-byte values between big- and little-endian machines, for example.
The test could also have included questions on things like packing bools and enums into structures. Again critical to get righht when sharing or transmitting structs, or anything bigger than a char.
Yep, unfortunately a universal 'Network-Byte_order'[1] stil doesn't exist.
Things like this were especially a pain with industrial network protocols that we used in our applications during the transition from PowerPC to Intel on OS X. All the CFSwapInt16HostToBig() and CFSwapInt16BigToHost() stuff...
And earlier before I learned about '#pragma packed' it took me a while to find out where all these extra zeroes suddenly came from :-D
To all those kvetching about weird C (and by extension, C++) behaviors... what is your alternative for an unmanaged, system level language with broad compiler/library support?
THIS is why C is bad. It’s not because there are no classes, and you have to manage your own memory, or because it has unsafe pointers. It’s this. And it’s the same reason why C++ is awful even before you get to the "++" part. I wish that by now we had a proper ASM abstraction language that allows the programmer to work with the architecture they’re programming for, instead of all the “undefined behaviour” we have to deal with since the 70s.
I knew it was Undefined Behaviour, but I played along and tried to guess what it would return on GCC/Windows/x86/no optimisation, in the assumption that the author is clueless about UB and just ran some tests and did a quiz about it.
The C standard requires that the basic int type is at least 16 bits. The same is true of unsigned int. So, either way, 1 << 16 is too large a shift if the integer is 16 bits on your platform.
In section J.2 Undefined Behaviour, the standard mentions that it's undefined when "an expression is shifted by a negative number or by an amount greater than or equal to the width of the promoted expression"
Although it wouldn't affect the result in this case, 1 << 16 could actually result in 1. For a 16-bit number, a processor can consider only the least significant 4 bits of the second operand, which would be 0, resulting in 1 << 0.
> If the value of the right operand is negative
> or is greater than or equal to the width of
> the promoted left operand, the behavior is
> undefined"
Yeah. It seems like something easy to define to me, at least in the case the compiler can recognize the shift is a constant. But that's the standard for you.
I'm not C expert, by far; I only have an instinct that if it's about C and memory layout, I shouldn't rely on my understanding and check it instead. On every architecture I'm targeting.
I enjoyed this post. I am curious do any of you have any C-specific blogs that you read regularly? Maybe not exclusively C but predominantly C or mostly C-related?
I realized these were trick questions at #5. My answers till then were "well, on gcc on i686, probably ____". The trick questions are in poor taste imo.
I would've thought the first question was a big red flag but I guess in Windows and x86 linux int is 32 bit even with 64-bit processors. Not the case in MIPS64 (under IRIX) or on DEC Alpha.
I had my formative C experience in the DOS era with 16-bit compilers and then SGIs with 64-bit compilers (and I had the "fun" of porting grad student written code from SGI to Linux) so you get burned with expectations about int length. Packing makes it even worse as many RISC machines (like MIPS) are word addressed and non-aligned loads require a bunch of bit fiddling or you get a Bus Error (SIGBUS).
While I get the overall message I'm not sure last question is that unpredictable:
I++ + ++I
If postfix is executed first it returns 0 but prefix returns 2 and the result is 2. If prefix is executed first it returns 1 and then postfix also returns 1, result of the addition is again 2.
What else is ambiguous about this snippet? I can't think of a parse tree that'd evaluate addition before any of the increments - that'd be a syntax error.
If the code were (++i + i++) there could be alternative interpretations, but again ++ requires lvalue...
The order of evaluation in such an expression is entirely unspecified. There are points in C programs that separate a "before" and an "after". The semicolon is a typical such point. There are others. Now between two adjacent points anything goes.
In this particular example the evaluations and effects are allowed to be interleaved:
pre = I + 1; // evaluating the pre-increment
post = I; // evaluating the post-increment
I++; // effect of pre-increment
I++; // effect of post-increment
return pre + post; // evaluating the sum
Simply put, pre and post-increment are allowed to occur "simultaneously".
what you are saying may be correct, but you are "overloading" the term "order of evaluation" to encompass atomicity (atomicity of operations like i++); is that kosher?
Indeed I am. What I want to convey is, between 2 sequence points, C is actually non-strict. Any effect is like unsafeInterleaveIO from Haskell. Which is why I rarely mix up effects together or with a computation in a single instruction.
In my case, I'd normally say that Betteridge's law applies. But I did surprisingly well on the quiz.
Though I'd agree with the comments on how this is kind of silly. The answer is not always "I don't know" but rather, in some of those cases, "I didn't define my data types well enough to know for sure, depending on CPU architecture and the compiler".