On principle you're right, but at least for computer vision the number of layers you mention are a bit off. VGG16 worked well with 16 layers without any special handling. ResNet went to >150 layers by using shortcuts, which kind of cracked the problem already. This paper gives us more insight and maybe a more elegant solution.
edit: Just realized you said 2/3 _fully connected layers_, which is right. But for convolutions we needed skip connections, too, to get them to work. Any reason you single out fully connected layers?
Regarding your edit, the authors of the paper in question focus on FNNs and note the reason in the paper:
> Both RNNs and CNNs can stabilize learning via weight sharing, therefore they are less prone to these perturbations. In contrast, FNNs trained with normalization techniques suffer from these perturbations and have high variance in the training error (see Figure 1).
Essentially FNNs stand to benefit more from this work than CNNs or RNNs.
> And I hate when people use efficiency when they mean speed.
Except they're usually right. "Efficiency" is not just a measure of algorithmic complexity. Something is more efficient if you are able to do more of a desired activity at the cost of fewer resources. The resource in question varies.
Yes, one often in computer science focuses on algorithmic complexity as the resource to be economized on. But time and power are also perfectly reasonable resources to have in mind. In fact, they are the resource that most audiences will naturally assume when they see the word "efficient" standing along, without further explanation.
I think GP is probably wrong to criticize as well -- making code more efficient [or, if you insist, fast] makes it easier to support more users that it could support otherwise. Is the ease with which a system can support an additional user not the very definition of the word "scalability"?
Of course, there are more powerful, and more specialized techniques for improving scalability, but that doesn't mean the author is wrong to suggest that speed improves scalability.
Sometimes it does. For example if you write a simple O(n) for loop in Python, then convert it to Cython, then compile it, the compiler may replace the for loop by an equivalent O(1) algorithm. I use this to surprise students when benchmarking Cython code for teaching purposes in my SageMath course.
If it's just a matter of speeding up a tight loop, I find numba performs admirably and it's a lot more convenient than Cython.
When you need to wrap some c code, or use high-level data structures (that cython handles beautifully with STL integration) that's when it makes sense to drop to Cython.
I'll give you "the complexity of your algorithm" likely doesn't change, but how that algorithm is processed is usually more efficient when compiled through C, rather than running through a Python interpreter, if only because of how it's running.
I agree it is faster by running it through compiled C code.
Saying "my code is more efficient than your code" might be acceptable, but saying "my code is efficient" drives me mad when it is used as a synonym for "my code is fast". Take a look at [1] and the note that this is not about optimization. Efficient algorithms are usually algorithms with close to optimal time or space complexitiy.
Having zero directed interaction might be a problem, but seeing only 5 thoughts for 24 hours makes you think about those thoughts a lot more and gives you a lot of time to process and answer them. Maybe there are interesting answers to your thoughts and questions you will never know of.
k-means is not a mode seeking algorithm, I think. You are clustering your color space, but you're not even guaranteed to end up with colors that are very close to those in your image. With a high k you're getting actual colors in the image, but they're not really dominant anymore.
What about mean shift? It's based on one method of non-parametric density estimation. Another method is this: Some sort of histogramming, which is another method of non-parametric density estimation.
My point was that I don't want to use the password. Don't want it in my cronjob file, don't want it in my logfiles. It's really bad practice of programs to rely on the Password. On the bright side, gmail also allows you to generate application specific passwords.
edit: Just realized you said 2/3 _fully connected layers_, which is right. But for convolutions we needed skip connections, too, to get them to work. Any reason you single out fully connected layers?