If some of the edges are infinitely sharp, and you know which ones they are by looking at them, as in my example, then it's using more than all its bandwidth at any resolution.
That's true in the 1D case as well. That requires upsampling with information generation before downsampling. Using priori to guess missing information is a task that will never be finished and is interesting. It isn't necessary for a satisfactory downsampling result.
One interesting complication for a lot of photos is that the bandwidth of the green channel is twice as high as the red and blue channels due to the Bayer filter mosaic.
That's why a vector image rendered at 128x128 can look better/sharper than one rendered at 256x256 and scaled down.