Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, I think there exists a case where this is possible. Let me play around with it this weekend and I'll comment here if I find anything.


I know it can happen on some older ARM designs, microcontrollers and weird DSP-like etc. chips. But I can't think of any case on modern x86 chips at least.

Some low-performance ARMv7/8 designs can split NEON SIMD instructions into multiple clock cycles, but I think even then NEON is going to perform better.


I started digging into it but I don't have an intel machine with linux easily available and that's a prerequisite for looking at microop performance. I think you win.


I don't need to win, I just want the truth to win. In other words, I'd consider it a win if I, you or anyone else learns something new.

SIMD is highly optimized at this point, with sustainable two instructions per clock throughput. It's hard to imagine scalar getting anywhere near even at 4 inst/clk rates.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: