I would expect a small number of variables to fit in cache or registers, so the ...

rep_lodsb · on Oct 20, 2022

It still has to compare every pair of variables, then do the CMOV (which can't be predicted). Then do the same with the result of the first step, computing the minimum for each pair of pairs. Not sure how many of these the CPU can do in parallel!

tylerhou · on Oct 20, 2022

I don't mean the cmov code, I mean the normal naive testl + sete code. The CPU can do all testl's in parallel and all sete's in parallel. I think this should be faster than branches if you ever have a 0.

I'm assuming your branch code looks something like:

  if (a == 0) {
    return true;
  }
  if (b == 0) {
    return true;
  }
  // ...
  return false;