Except I was wrong... sorry. I haven't been able to find documentation that lazy ELF symbol binding is disabled when taking the address of an extern fuction, but using objdump to dump the .plt section, it's clear printf is no longer lazily bound when you need to take its address. Instead, the entry is marked to be resolved at load time, and the global function entry point is just read directly from the GOT entry.
Part of my assumption that the PLT entry was used as the function address came from trying to figure out why the AMD engineers didn't include an ip-relative indirect addressing mode for the call instruction when they designed x86_64. An ip-relative indirect call could directly call through the GOT and avoid wasting an instruction cache line on the PLT entry. The PLT entry would still be used once for lazy symbol binding, but after that wouldn't be used and wouldn't cause any more cache evictions. Such an indirect call would need to be broken down into several micro-ops internally, but would save instruction cache space.
I did a bit of thinking and came to the conclusion that the PLT entry was still necessary for taking the function address of a lazily bound ELF symbol, so such an addressing mode would almost never be used.
Now that I see the PLT entry is actually never used for taking the function address, I'm a bit surprised that AMD when designing long mode (x86_64) didn't include an ip-relative indirect addressing mode for the call instruction.
I'm just a mechanical engineer by training who's way too much self-taught about software, so I'm sure the AMD hardware engineers (and maybe also RISC-V folks... I'm less familiar with RISC-V addressing) had very good reasons for not having ip-relative indirect function calls. My best guess is that the complexity increase wasn't worth it and/or there was some hidden performance cost that's not obvious to me.