Lengths should always be unsigned. Always. I'd even say they should always be ui...

astrange · on April 20, 2012

Google recommends the opposite:

http://google-styleguide.googlecode.com/svn/trunk/cppguide.x...

I prefer this since unsigned underflow (which is an easy bug) produces a value which is still a valid size and is not detected by IOC or -ftrapv. Also, it requires you to use unsigned loop indexes, which will simply lead to more bugs.

The fact that your program contains no individual objects whose size is > INT_MAX should be a sign that you should use int for their size.

djmdjm · on April 19, 2012

Please explain how a string length would need more than size_t to represent it.

adobriyan · on April 19, 2012

This is largely theoretical but still.

Smart programming languages have bignum integers and seamless promotion between fixnum and bignum versions. Thus they avoid the issue completely, no stupid casts (especially casts which change both signedness and size), no overflows, no nothing, just an integer length.

It's a perfect world and everyone should try to achieve it.

C is not such programming language, with the closest approximation being uintmax_t.

size_t is of course enough in practice.

wladimir · on April 19, 2012

Even for fixnums, integer overflows and underflows of any kind should ideally result in an exception by default. I think it's a pity that C(++) doesn't have support for this. A lot of bugs and weaknesses could have been prevented (for example, CWE 680 http://cwe.mitre.org/data/definitions/680.html).

I know this decision (to simply wrap around in the case of an over/underflow) was probably performance-driven, but on the other hand, if the common languages had required it, CPUs would have better support for it...

Edit: Some googling shows that Microsoft has a SafeInt class in common use that reports under/overflow: http://safeint.codeplex.com/ . Still it feels like a kludge for this to be not part of the main language.

JoachimSchipper · on April 19, 2012

(Hint: gcc has a -ftrapv option, which traps on signed integer overflow. Unsigned overflow is defined, so trapping on that would break working code.)

wladimir · on April 19, 2012

I don't think I'd necessarily want to trap on all (signed) integer overflows. As you say, it might break working code. There's just too much C/C++ code around to change that retroactively. And for modular arithmetic and such it is desirable for integers to wrap around.

But a "trap on overflow" signed and unsigned int type would be nice.

JoachimSchipper · on April 19, 2012

Recent gcc versions use the fact that signed overflow is undefined to do some unexpected optimizations (in particular, a + b < a will never be true if a, b are ints.) I don't think -ftrapv is going to cause many additional errors, but I haven't actually tried it. (Also, http://embed.cs.utah.edu/ioc/ looks interesting.)

Natsu · on April 20, 2012

> (in particular, a + b < a will never be true if a, b are ints.)

Code of exactly that form in the patch for this bug made me do a double take. Fortunately, they'd also changed the a & b from plain ints to size_t, so it was ok.

tlb · on April 19, 2012

That only avoids some problems. For example, this is vulnerable to s = MAX_SIZE_T/4 + 1

  size_t s = get_size();
  int *a = (int *)malloc(s * sizeof(int));
  for (size_t i=0; i<s; i++) {
    a[i] = get_int();
  }

I've gotten in the habit of, when reading array sizes over the wire, explicitly limiting to a reasonable value like 1 million. Occasionally things should be allowed to use all available memory, but it's rare.

JoachimSchipper · on April 19, 2012

IMHO, size_t (for in-memory objects) or off_t (for files) makes more sense.

That is not the issue here, though: casting longs to ints would cause trouble even if both are unsigned.

adobriyan · on April 19, 2012

You didn't understand.

Consistent use of one (unsigned) type for lengths would avoid issues because there'd be no casts.

off_t is signed and doesn't make sense because file length can't be negative.

JoachimSchipper · on April 19, 2012

Indeed, consistent use of one type for lengths avoids issues because you need no casts. For in-memory data, the type to use is size_t since that is what e.g. memcpy(), strncmp() and read() accept; for a position in a file or a file length, off_t is a better choice, since that is what pread(), lseek() and stat() use. I'm a fan of unsigned data types in general, but using off_t for file lengths is the way to go. (It's not like you need defined overflow semantics for file positions, anyway.)

(You don't need off64_t if you compile with the proper #defines, which you should do on Linux.)

acqq · on April 19, 2012

> off_t is signed and doesn't make sense

Wrong, it's used to seek in both directions in calls where direction is signed by... a sign!

http://www.unix.com/man-page/POSIX/3posix/lseek/

adobriyan · on April 19, 2012

[slowly]

off_t is wrong for maintaining length because off_t is signed and lengths are fundamentally unsigned.

Signed offset is OK for seeking.

adobriyan · on April 19, 2012

In fact existence of off64_t proves off_t is not OK.

tedunangst · on April 19, 2012

The existence of off64_t proves Linux fucked up. It's not a standard type and other operating systems don't have it.

SamReidHughes · on April 19, 2012

No. Lengths should always be signed.