`madvise(2)` doesn't matter _that_ much in my experience with [1] on modern Linux kernels. SSD just can't read _quite_ as quickly as memory in my testing. Sure, SSD will be able to re-read a lot into ram, analogous to how memory reading will be able to rapidly prefetch into L1.
I get ~30 GiB/s for threaded sequential memory reads, but ~4 GiB/s for SSD. However, I think the SSD number is single-threaded and not even with io_uring—so I need to regenerate those numbers. It's possible it could be 2-4x better.
I think the effects of madvise primarily crop up in extremely I/O-saturated scenarios, which are rare. Reads primarily incur latency, with a good SSD it's hard to actually run into IOPS limitations and you're not likely to run out of RAM for caching either in this scenario. MADV_RANDOM is usually a pessimization, MADV_SEQUENTIAL may help if you are truly reading sequentially, but may also worsen performance as pages don't linger as long.
But as I mentioned, there's caching upon caching, and also protocol level optimizations, and hardware-level considerations (physical block size may be quite large but is generally unknown).
It's nearly impossible to benchmark this stuff in a meaningful way. Or rather, it's nearly impossible to know what you are benchmarking, as there are a lot of nontrivially stateful parts all the way down that have real impact on your performance.
There are so many moving parts I think the only meaningful disk benchmarks consider whatever application you want to make go faster. Do the change. Is it faster? Great. Is it not? Well at least you learned.
> I get ~30 GiB/s for threaded sequential memory reads, but ~4 GiB/s for SSD. However, I think the SSD number is single-threaded and not even with io_uring—so I need to regenerate those numbers. It's possible it could be 2-4x better.
Assuming that you run the experiments on NVMe SSD which is attached to PCIe 3.0, where theoretical maximum is around 1GB/s per each lane, I am not sure I understand how do you expect to go faster than 4 GiB/s? Isn't that already a theoretical maximum of what you can achieve?
I get ~30 GiB/s for threaded sequential memory reads, but ~4 GiB/s for SSD. However, I think the SSD number is single-threaded and not even with io_uring—so I need to regenerate those numbers. It's possible it could be 2-4x better.
[1]: https://github.com/sirupsen/napkin-math