I guess it's a simply off-by-one error. Most likely they are reading the file data into buffers of size 128, but then only compare the first 127 characters of each buffer (e.g., because they use < instead of <= in their loop).
Probably something like loading the 128 bytes into a byte array, then doing something to it that requires it to be \0 terminated and then the last byte is overwritten.
EDIT: The suggested workaround is to run it in binary mode. As binary doesn't have a concept of \0-terminated strings, this seems to back up my theory.
If it's something that simple then how on earth could MS ship it? What tests would they have run that could miss this - I'd think that a test on a file comparison program would include cycling through a single bit alteration and checking that it was captured up to some arbitrary number of bytes (and I'd think you'd let it run > 128 bytes too).
I'm not a coder, does this sort of assumption seem reasonable. If you can't get the basics right ...?
C is a fantastic language for failing to the get the basics right. It takes only a moment's inattention from even a C master and you've got an error like this. In almost every more modern language I can think of, the language itself would afford code that would make this particular error much harder to write, unless you write C-in-(Python/Ruby/Haskell/etc) in the first place. And of course those other languages all have their own problems, but they tend to be fewer, which is why we can write in them faster.
Your test suite seems broken (which is why test suites are hard to write in the first place). What if there is a bug when you have two bits changed -- one in the 128th byte and one in the 256th byte? Here's a possible test suite that could catch those bugs.
Take two files. Generate all possible combinations of byte-differences between the two files up to a length of 256 bytes and flag invalid comparisons.
Now, what is the running time of this test case? I'll give you a hint: this test case is unlikely to finish before the heat-death of the universe.
In general it's very hard to right comprehensive test suites and the only thing this proves is that Microsoft developers are humans, not gods. In fact, this is exactly the type of weird corner-case bug that I would expect to find in code written by good developers.
>Your test suite seems broken (which is why test suites are hard to write in the first place). What if there is a bug when you have two bits changed
I said "a test ... would include ...".
This just seems the first most obvious pattern to check against. You need to be sure the program compares every bit, that just seems the simplest way to be sure that it's looping through every bit of both files and making a proper comparison. Knowing the internals of the program and functions used would give you the data length to check against.
You seem to be saying that because a comprehensive test is impossible no test should have been performed.
In any case wouldn't you see it in the ASCII read out of a watch routine or some such - hey look [made up variable] currentChunk has string terminators at the end which aren't in the chunks of the test file.
You say good coders would miss this sort of bug, I'd think a good coder would realise the function they're using puts a string terminator in.
Without seeing the code I guess we wouldn't know but direct comparison is surely one of the more basic operations to code (although I grant that doing it quickly maybe isn't). Surely the fundamental part is to XOR registers and look for 1s???
He didn't suggest that the test suite should account for all possible combinations of byte differences. He suggested that the test suite should account for all possible combinations of SINGLE byte differences. I.E., the test suite should have checked two files with only the first byte different, with only the second byte different, etc. Such a test suite would be linearly complex, not exponentially, and could easily be run before the heat death of anything.
(However, to play my own devil's advocate, I'd have to say it's easy in retrospect to say "yes! there is an easy test for this that should have been written!" when in fact often the number of possible tests is astronomically large and it can be hard to pick the right ones. What if the bug was that FC.EXE didn't correctly register a difference when both the 127th and 128th bytes were the only differences? The proposed test suite would not have caught it.)
Your devil's advocate argument seems to be just my post. I was trying to show that, while the parent's test suite was linear running time, the number of tests in a comprehensive test suite is exponential, therefore the runtime of any completely comprehensive test suite is exponential. Choosing the correct tests to use resources on is a very difficult problem.
This is an easy mistake to make, and miss when looking over your code. With hindsight - and having it explained to you - it is also a very easy mistake to understand.
That doesn't mean it was an easy mistake to find. I bet it went without being detected for a long time.
I don't know the root cause, but something this esoteric would most likely be discoverable only in a code review, or possibly a unit test. It's hard to imagine someone thinking of testing the case of 2 files that differ in every 128 bytes without reading the code.
The most common case: any situation in which only one character in a file is different. For one in every 128 pairs of these files, more or less, fc.exe will say "no differences found".
Are you asking why anyone would ever change only one character in a file? Typo corrections come to mind, or mild file corruption. I recovered a ton of files from a failing drive a few years ago, and quite a few of the text files had just a character or two corrupted.
I'm not sure what people generally use fc for -- I don't -- so it's hard to say how serious the impact might be when it fails.
This was from XP before they re-architected the DLL management. I'm not sure blaming Microsoft for a mistake made more than 9 years ago in OS design is relevant or helpful.
fc.exe doesn't use any DLLs. It's a trivial console application, deployed as a single .exe that can be overwritten as long as it's not currently running.
Did you look any of this information up before making your statements? It looks like ulib.dll is a DLL for file utilities, which would perform an analogous purpose as librt and libc in any UNIX system.
This was from XP before they re-architected the DLL management. I'm not sure blaming Microsoft for a mistake made more than 9 years ago in OS design is relevant or helpful.
You can rest assured that the actual bug was not in a CRT library DLL. They may have had a good reason for updating ulib.dll which wasn't mentioned in the KB article, but fixing an off-by-one bug in an app-specific memory-compare loop wasn't it.
Sign.... 30+ years on, and MS still can not figure out how to change a basic setting or update something without also requiring a reboot in the process.
Microsoft's single purpose 'File-Compare' utility actually fails to do the one thing that it was designed for? It boggles the mind. (Sorry for the lack of content, but OMG and WTF!)