Costs to Google are not magically different. You can make estimates without insider knowledge, but as in the window cleaner example, your estimates will be as bad as your assumptions.
You can also make estimates for compute, network, and storage costs based on the prices Google charged its Cloud customers for the same.
You don't. The exercise is in estimation. This is specifically not a case of the interviewer looking for you to get the "right" answer. The interviewer likely doesn't even know what the right answer is. They want to see if you can make back-of-the-envelope calculations and if you're capable of making sane (if inaccurate) assumptions.
Make a guess at total cost for an hour of compute time and how long it might take to transcode the average video. Guess at how many videos are uploaded on a typical day. Guess at how much the typical SRE costs Google and how many SREs YouTube employs. Do the same for software engineers, or explicitly exclude R&D. Guess at networking, storage, etc. Then roll all that together with some hours of video * (cost to transcode + cost to storage + cost to upload + cost to playback * average viewers) + sre cost +.... Bonus points if you can account for elasticity and peak load instead of just averages.
The point is to show that you can think through the problem. If all you can say is "I don't know what your networking costs are", then you come across as useless.
He's not a new grad, he was a director of PM. He should have a feel for ballpark figures regarding infrastructure and personnel costs, which don't vary by that big a factor from company to company.
The question is perfectly reasonable (and it sounds like the interviewee was providing a reasonable answer). The issue is the way the interviewer ran the interview, not with the particular question itself.
You can also make estimates for compute, network, and storage costs based on the prices Google charged its Cloud customers for the same.