I would expect a small number of variables to fit in cache or registers, so the small extra OOO/data dependency penalty should be much smaller than the cost of a mispredict, no?
It still has to compare every pair of variables, then do the CMOV (which can't be predicted). Then do the same with the result of the first step, computing the minimum for each pair of pairs. Not sure how many of these the CPU can do in parallel!
I don't mean the cmov code, I mean the normal naive testl + sete code. The CPU can do all testl's in parallel and all sete's in parallel. I think this should be faster than branches if you ever have a 0.
I'm assuming your branch code looks something like:
if (a == 0) {
return true;
}
if (b == 0) {
return true;
}
// ...
return false;