This is a case where the optimal strategy depends a lot on your input I think. F...

This is a case where the optimal strategy depends a lot on your input I think.

For example, long ago I wrote a small program to find file duplicates. My first step was to use the fact that files with different lengths can't be duplicates. Thus only files which had the same length had to be checked further.

For files on your average hard drive, that simple test screens the vast majority of them.