If Sun would have picked a ‘real’ CPU, I doubt it would have been x86 (did they even sell any of them at the time?)
Also, picking a real CPU is only worth it for running on that CPU, and that particular one. If they had picked x86 in 1996, they wouldn’t even have supported MMX.
In a platform-independent design, using n registers in the virtual CPU is only a good idea for n = 0 (i.e. a stack machine, as in the JVM, .NET CLR and Web Assembly) and n = ∞ (as in LLVM).
If you pick any other number your platform-independent code needs to handle both the case where the number of op real registers is smaller (so you need to have a register allocator) and the case where it is larger (there, you could just ignore the additional registers, but that means giving up a lot of performance, so you have to somehow figure out which data would have been in registers if the CPU had more of them).
Why write and optimize both of these complex problems, if you can pick a number of registers at either end of the scale, and spend all your resources on one of them?
And that’s even more true for x86-64, which doesn’t have a fully orthogonal instruction set, has I don’t know how many ways to do vector instructions, none of which allow you to clearly express how long the vectors you’re iterating over are (making it hard to optimally map them to other CPUs or newer vector instructions), has 80-bit floats, etc.
Also, the high-level nature of Java bytecode enables/simplifies many optimisations in the JIT.
For example, dynamic dispatch is handled by the JIT, it's not encoded into low-level Java bytecode instructions, so if the JIT can see there's only one class loaded that implements a certain interface, it can generate code that directly invokes methods of that implementing class, without going through a vtable. It can do this even across library boundaries. That wouldn't be possible (or at least, would be greatly complicated) if Java bytecode provided pre-baked machine code.
Modern JVMs also have pretty deep integration of the GC and the JIT, if I understand correctly. The Java bytecode format is high level so the JIT is quite free to implement memory-management however it likes. If the JVM took a truly low-level approach to its IR, we'd presumably be stuck with 90's GC technology.
I imagine it would also have implications for the way the JVM handles concurrency. It seems right that it defines its own memory model with its own safety guarantees, rather than defining its model as whatever x86 does.
It's telling that .Net took the same high-level approach that Java bytecode did.
Also, picking a real CPU is only worth it for running on that CPU, and that particular one. If they had picked x86 in 1996, they wouldn’t even have supported MMX.
In a platform-independent design, using n registers in the virtual CPU is only a good idea for n = 0 (i.e. a stack machine, as in the JVM, .NET CLR and Web Assembly) and n = ∞ (as in LLVM).
If you pick any other number your platform-independent code needs to handle both the case where the number of op real registers is smaller (so you need to have a register allocator) and the case where it is larger (there, you could just ignore the additional registers, but that means giving up a lot of performance, so you have to somehow figure out which data would have been in registers if the CPU had more of them).
Why write and optimize both of these complex problems, if you can pick a number of registers at either end of the scale, and spend all your resources on one of them?
And that’s even more true for x86-64, which doesn’t have a fully orthogonal instruction set, has I don’t know how many ways to do vector instructions, none of which allow you to clearly express how long the vectors you’re iterating over are (making it hard to optimally map them to other CPUs or newer vector instructions), has 80-bit floats, etc.
There’s a reason Google stopped work on https://en.wikipedia.org/wiki/Google_Native_Client in favor of Web Assembly.