The amount of code necessary to wire up a well-written application into iOS and Android is pretty minor. I'd rather do that where I can't use GLFW (I mean, GLEW does most of the hard work anyway, and that is portable). GLFW epitomizes a good library for this stuff, in that it's minimal and minimizes the amount of stuff you have to wrestle with when writing C++; on the other hand, SDL forces you to come to grips with its weight and its own dubious design decisions (and a lot of plumbing code to make it sane in C++).
When wrapping OpenGL code myself, at least the dubious design decisions are my own and I understand them intuitively.
There's definitely improvements being made to LLVM to automatically parallelize code (esp unrolling loops) to SIMD.
I haven't personally tried it, but would love for it to match the code quality of hand-cranked assembly... writing it is tedious, error prone, but you do get control over when you preload the cache, the stack, and you can do really cool things with the CPP and macros to "manually inline" things. :-)
And who doesn't like writing a good 'ol fashioned jump table?!