Byte order is one of the great unnecessary historical fuck ups in computing. A s...

joppy · on May 8, 2021

Why is it an issue any more than say, order of fields in a struct is an issue? In one case you read bytes off the disk by doing ((b[0] << 8) | b[1]) (or equivalent), with the order reversed the other way around. Any application-level (say, not a compiler, debugger, etc) program should not even need to know the native byte order, it should only need to know the encoding that the file it’s trying to read used.

zabzonk · on May 8, 2021

> order of fields in a struct

This is defined in C to be the order the fields are declared in.

occamrazor · on May 8, 2021

But the padding rules between fields are a mess.

amelius · on May 8, 2021

By the way, mathematicians also have their fuck ups:

https://tauday.com/tau-manifesto

8jy89hui · on May 8, 2021

For anyone curious or who is still attached to pi, here is a response to the tau manifesto:

https://blog.wolfram.com/2015/06/28/2-pi-or-not-2-pi/

pantalaimon · on May 8, 2021

The good thing is that Big Endian is pretty much irrelevant these days. Of all the historically Big Endian architectures, s390x is indeed the only one left that has not switched to little endian.

tssva · on May 8, 2021

Network byte order is big endian so it is far from being pretty much irrelevant these days.

BenoitEssiambre · on May 8, 2021

Also, this might be irrelevant at the cpu level, but within a byte, bits are usually displayed most significant bit first, so with little endian you end up with bit order:

7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8

instead of

15 to 0

This is because little endian is not how humans write numbers. For consistency with little endianness we would have to switch to writing "one hundred and twenty three" as

321

mafuy · on May 8, 2021

Correct me if I'm wrong, but were the now common numbers not imported in the same order from Arabic, which writes right to left? So numbers were invented in little endian, and we just forgot to translate their order.

dahart · on May 8, 2021

Good question, I just did a little digging to see if I could find out. It sounds like old Arabic did indeed use little endian in writing and speaking, but modern Arabic does not. However, place values weren’t invented in Arabic, Wikipedia says that occurred in Mesopotamia, which spoke primarily Sumerian and was written in Cuneiform - where the direction was left to right.

https://en.wikipedia.org/wiki/Number#First_use_of_numbers

https://en.wikipedia.org/wiki/Mesopotamia

https://en.wikipedia.org/wiki/Cuneiform

gpanders · on May 8, 2021

It might not be how humans write numbers but it is consistent with how we think about numbers in a base system.

123 = 3x10^0 + 2x10^1 + 1x10^2

So if you were to go and label each digit in 123 with the power of 10 it represents, you end up with little endian ordering (eg the 3 has index 0 and the 1 has index 2). This is why little endian has always made more sense to me, personally.

dahart · on May 8, 2021

I always think about values in big endian, largest digit first. Scientific notation, for example, since often we only care about the first few digits.

I sometimes think about arithmetic in little endian, since addition always starts with the least significant digit, due to the right-to-left dependency of carrying.

Except lately I’ve been doing large additions big-endian style left-to-right, allowing intermediate “digits” with a value greater than 9, and doing the carry pass separately after the digit addition pass. It feels easier to me to think about addition this way, even though it’s a less efficient notation.

Long division and modulus are also big-endian operations. My favorite CS trick was learning how you can compute any arbitrarily sized number mod 7 in your head as fast as people are reading the digits of the number, from left to right. If you did it little-endian you’d have to remember the entire number, but in big endian you can forget each digit as soon as you use it.

BenoitEssiambre · on May 8, 2021

I don't know, when we write in general, we tend to write the most significant stuff first so you lose less information if you stop early. Even numbers we truncate twelve millions instead of something like twelve millions, zero thousand zero hundreds and 0.

lanstin · on May 8, 2021

Next you are going to want little endian polynomials, and that is just too far. Also, the advantage of big endian is it naturally extends to decimals/negative exponents where the later on things are less important. X squared plus x plus three minus one over x plus one over x squared etc.

Loss of big endian chips saddens me like the loss of underscores in var names in Go Lang. The homogeneity is worth something, thanks intel and camelCase, but the old order that passes away and is no more had the beauty of a new world.

occamrazor · on May 8, 2021

In German _ein hundert drei und zwanzig_, literally _one hundred three and twenty_. The hardest part is are telephone numbers, that are usually given in blocks of two digits.

lanstin · on May 8, 2021

Well that would be hard for me to learn. I always find the small numbers between like 10 and 100 or 1000 the hardest for me to remember in languages I am trying to learn a bit of.

LightMachine · on May 8, 2021

Exactly. This is so infuriating. Whoever let little-endian win made a huge disfavor for humanity.

kstenerud · on May 8, 2021

The only benefit to big endian is that it's easier for humans to read in a hex dump. Little endian on the other hand has many tricks available to it for building encoding schemes that are efficient on the decoder side.

tom_mellior · on May 8, 2021

Could you elaborate on these tricks? This sounds interesting.

The only thing I'm aware of that's neat in little endian is that if you want the low byte (or word or whatever suffix) of a number stored at address a, then you can simply read a byte from exactly that address. Even if you don't know the size of the original number.

kstenerud · on May 8, 2021

I've posted in some other replies, but a few:

- Long addition is possible across very large integers by just adding the bytes and keeping track of the carry.

- Encoding variable sized integers is possible through an easy algorithm: set aside space in the encoded data for the size, then encode the low bits of the value, shift, repeat until value = 0. When done, store the number of bytes you wrote to the earlier length field. The length calculation comes for free.

- Decoding unaligned bits into big integers is easy because you just store the leftover bits in the next value of the bigint array and keep going. With big endian, you're going high bits to low bits, so once you pass to more than one element in the bigint array, you have to start shifting across multiple elements for every piece you decode from then on.

- Storing bit-encoded length fields into structs becomes trivial since it's always in the low bit, and you can just incrementally build the value low-to-high using the previously decoded length field. Super easy and quick decoding, without having to prepare specific sized destinations.

jart · on May 8, 2021

Blame the people who failed to localize the right-to-left convention when arabic numerals were adopted. It's one of those things like pi vs. tau or jacobin weights and measurements vs. planck units. Tradition isn't always correct. John von Neumann understood that when he designed modern architecture and muh hex dump is not an argument.

froh · on May 8, 2021

that's why little endian == broken endian

said a friend who also quips: "never trust a computer you can lift"

chrisseaton · on May 8, 2021

> The good thing is that Big Endian is pretty much irrelevant these days.

This is nonsense - many file formats are big endian.

diplomatpuppy · on May 8, 2021

With a bonus of some being EBCDIC too.

lanstin · on May 8, 2021

This is true.

akvadrako · on May 8, 2021

Network protocols still mostly use "Network Byte Order", i.e. big endian.

lanstin · on May 8, 2021

Or text. Or handled by generated code like protobuf.

erk__ · on May 8, 2021

As there was talk about in a subthread yesterday [0] so does arm support big endian though it is not used as much anymore is it still there.

POWER also still uses big endian though recently little endian POWER have gotten more popular

[0]: https://news.ycombinator.com/item?id=27075419

globular-toast · on May 8, 2021

Even if all CPUs were little-endian, big-endian would exist almost everywhere except CPUs, including in your head. Unless you're some odd person that actually thinks in little-endian.

mytailorisrich · on May 8, 2021

I don't think it's a fuck up, rather I think it was unavoidable: Both ways are equally valid and when the time came to make the decision, some people decided one way, some people decided the other way.

bregma · on May 8, 2021

And which is the correct byte ordering, pray tell?

wongarsu · on May 8, 2021

Big and little endian are named after the never-ending "holy" war in Gulliver's Travels over how to open eggs. So we were always of the opinion that it doesn't really matter. But I open my eggs on the little end

rwmj · on May 8, 2021

Big Endian of course :-) However the one which has won is Little Endian. Even IBM admitted this when it switched the default in POWER 7 to little endian. s390x is the only significant architecture that is still big endian.

bonzini · on May 8, 2021

Little endian has the advantage that you can read the low bits of data without having to adjust the address. So you can for example do long addition in memory order rather than having to go backwards, or (with an appropriate representation such as ULEB128) in one pass without knowing the size.

js8 · on May 8, 2021

Maybe I am biased working on mainframes, but I would personally take big endian over little endian. The reason is when reading a hex dump, I can easily read the binary integers from left to right.

bonzini · on May 8, 2021

That's the only thing that BE has over LE.

But for example bitmaps in BE are a huge source of bugs, as readers and writers need to agree on the size to use for memory operations.

"SIMD in a word" (e.g. doing strlen or strcmp with 32- or 64-bit memory accesses) might have mostly fallen out of fashion these days, but it's also more efficient in LE.

kstenerud · on May 8, 2021

Big endian is easier for humans to read when looking at a memory dump, but little endian has many useful features in binary encoding schemes due to the low byte being first.

I used to like big endian more, but after deep investigation I now prefer little endian for any encoding schemes.

bombcar · on May 8, 2021

Couldn’t encoding systems be redone with emphasis on the high-order bits? Or is the assumption that the values are clustered in the low bits?

amelius · on May 8, 2021

I think the fundamental problems is that if you start a computation using N most significant bits and then incrementally add more bits, e.g. N+M bits total, then your first N bits might change as a result.

E.g. decimal example:

    1.00/1.00 = 1.00
    1.000/1.001 = 0.999000999000...

(adding one more bit changes the first bits of the outcome)

kstenerud · on May 8, 2021

You can put emphasis on high order bits, but that makes decoding more complex. With little endian the decoder builds low to high, which is MUCH easier to deal with, especially on spillover.

For example, with ULEB128 [1], you just read 7 bits at a time, going higher and higher up the value you're reconstituting. If the value grows too big and you need to spill over to the next (such as with big integer implementations), you just fill the last bits of the old value, then put the remainder bits in the next value and continue on.

With a big endian encoding method (i.e. VLQ used in MIDI format), you start from the high bits and work your way down, which is fine until your value spills over. Because you only have the high bits decoded at the time of the spillover, you now have to start shifting bits along each of your already decoded big integer portions until you finally decode the lowest bit. This of course gets progressively slower as the bits and your big integer portions pile up.

Encoding is easier too, since you don't need to check if for example a uint64 integer value can be encoded in 1, 2, 3, 4, 5, 6, 7 or 8 bits. Just encode the low 8 bits, shift the source right by 8, repeat, until the source value is 0. Then backtrack to the as-yet-blank encoded length field in your message and stuff in how many bytes you encoded. You just got the length calculation for free. Use a scheme where you only encode up to 60 bit values, place the length field in the low 4 bits, and Robert's your father's brother!

For data that is right-heavy (i.e. the fully formed data always has real data on the right side and blank filler on the left - such as uint32 value 8 is actually 0x00000008), you want a little endian scheme. For data that is left-heavy, you want a big endian scheme. Since most of the data we deal with is right-heavy, little endian is the way to go.

You can see how this has influenced my encoding design in [2] [3] [4].

[1] https://en.wikipedia.org/wiki/LEB128

[2] https://github.com/kstenerud/concise-encoding/blob/master/cb...

[3] https://github.com/kstenerud/compact-float/blob/master/compa...

[4] https://github.com/kstenerud/compact-time/blob/master/compac...

CountHackulus · on May 8, 2021

Middle-endian is the only correct answer. It's a tradeoff between both little-endian and big-endian. The PDP-11 got it right.

ben509 · on May 8, 2021

Yup, we're all waiting for the rest of the world to catch up to MM/DD/YYYY.

gnufx · on May 9, 2021

I don't suppose it's being modified, but I wonder how much -11 code is still running, even on real hardware.

ta_ca · on May 8, 2021

the greatest of all is lisp not being the most mainstream language, and we can only blame the lisp companies for this fiasco. in an ideal world we all would be using a lisp with parametric polymorphism. from highest level abstractions to machine level, all in one language.