Fast paced networked games typically solved that by running a local simulation ahead of the server. The button you clicked looks depressed the instant you click it, not once the server knows about it. In FPS style games your character typically starts walking forwards the instant you press the forwards key, and you shoot the instant you click, not when the server finds out about it.
This has weird effects. Each player is actually playing in a slightly different world. You might see yourself hitting something and they might see themselves blocking the shot, and only one of you can be right. The different worlds will retroactively correct themselves to be consistent in some form or another (depending on the game it might be that the person shooting is always correct, or it might be that the person blocking is always correct, or it might be that whoever's packets reached the server first is correct, or really some complex combination of all of the above). The weird effects are worthwhile because people are really sensitive to latency in response to their inputs.
Even in slow placed games that use simpler networking models, I'm pretty sure the UI is basically always local. For example you might press the button that says "do the thing" and see the button style into it's "pressed state", but the server decides that the thing doer is dead before that button press reaches the server, so it ignores that button press.
This has weird effects. Each player is actually playing in a slightly different world. You might see yourself hitting something and they might see themselves blocking the shot, and only one of you can be right. The different worlds will retroactively correct themselves to be consistent in some form or another (depending on the game it might be that the person shooting is always correct, or it might be that the person blocking is always correct, or it might be that whoever's packets reached the server first is correct, or really some complex combination of all of the above). The weird effects are worthwhile because people are really sensitive to latency in response to their inputs.
Even in slow placed games that use simpler networking models, I'm pretty sure the UI is basically always local. For example you might press the button that says "do the thing" and see the button style into it's "pressed state", but the server decides that the thing doer is dead before that button press reaches the server, so it ignores that button press.