No profilers really hold a candle to vTune is the problem in general. I love the new AMD chips but uProf isn't in the same class as vTune and that is sad. I'm certain with better tools the AMD chips could be demolishing Intel by an even greater margin.
Writing a seoarate path for NEON is what would be needed. It's not like there are these magical SIMD functions (intrinsics) that work across architectures.