Specifically, the "universal function approximate" thing means no more and no less than the relatively trivial fact that if you draw a bunch of straight line segments you can approximate any (1D, suitably well-behaved) function as closely as you want by making the lines really short. Translating that to N dimensions and casting it into exactly the form that applies to neural networks and then making the proof solid isn't even that tough, it's mostly trivial once you write down the right definitions.