Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well you're trading one kind of B.S. for another kind of B.S.

There's a lot of B.S. that comes with C++, and there's an entirely different kind of B.S. involved with writing things in Java + Hadoop.

Personally I stay out of the C/C++ ecosystem as much as I can because threads are never really going to work in the GNU world because you can't trust the standard libraries never mind all the other libraries.

The LMAX Disruptor shows that if you write code carefully in Java it can scream. They estimate that they could maybe get 10% better throughput in C++ at great cost, but the average C++ programmer would probably screw up the threading and make something buggy, and a C++ programmer that's 2 SD better than the mean would still struggle with cache line and other detailed CPU issues.

The difference between the LMAX Disruptor and the "genius grade" C++ I've seen is that the code for the Disruptor is simple and beautiful, whereas you might spend a week and a half just figuring out how to build a "genius grade" C++ program, taking half an hour each pop.



Really, you're trading execution speed for productivity, not "BS for BS" when you use these so-called "web languages". In some cases, there are other concerns such as memory usage or software environment (e.g. trying installing a Java program on a system than doesn't allow JIT compilations).

Some problems can scale out, but only if latency between nodes is low enough and bandwidth is high enough. For example, an MMO server would not function as well if there was a 50 msec ping between nodes. You may or may not have control over that depending on what cloud service you use.

These are real concerns and should not be trivialized as "BS for BS" or "throw more virtualized CPU cores at it". Every problem is different; it should be studied and the best solution for the problem applied.


I'm talking about parallel programming, in general, as a competitor to high-speed serial programming.

In that case it is a matter of one kind of BS (wondering why you don't get the same answer with the GPU that you do with the CPU, waiting 1.5 hours for your C++ program to build, etc.) vs another kind of BS (figuring out problems in parallel systems.)

Not all problems scale out like that, but you can pick the problems you work on.


Java performs well as long as you're CPU bound. But memory is becoming cheap enough to keep substantial parts of a database in memory. Avoiding all that IO translates into enormous performance gains. Unfortunately, in Java (using the Oracle VM) you can't keep a lot of data in memory without getting killed by garbage collector pauses.


The genius of disruptor was in the data structure and access mechanisms, plus the fact that it worked for single producer / single consumer circumstances. It is certainly not an example you can tout for how Java is as fast as C/C++ under all circumstances if you are 'careful'. I think you are just falling prey to confirmation bias w.r.t. 'beauty' of code.

Having said that, C++ can be ugly as hell.


I'd say in some real life situations the gap is less than people think.

Back in the 1990's, when JIT compilation was new, I wrote a very crude implementation of Monte Carlo integration in Java that wasn't quite fast enough to do the parameter scan I wanted. I rewrote the program in C and switched to a more efficient sampling scheme.

When it was all said and done, I was disappointed with the performance delta of the C code. Writing the more complex algorithm in Java would have been a better use of my time.


But there are several things java insists on that are going to cost you in performance in java that are very, very difficult to fix.

1) UTF16 strings. Ever notice how sticking to byte[] arrays (which is a pain in the ass) can double performance in java ? C++ supports everything by default. Latin1, UTF-8, UTF-16, UTF-32, ... with sane defaults, and supports the full set of string operations on all of them. I have a program that caches a lot of string data. The java version is complete, but uses >10G of memory, where the C++ version storing the same data uses <3G.

2) Pointers everywhere. Pointers, pointers and yet more pointers, and more than that still. So datastructures in java will never match their equivalent in C++ in lookup speeds. Plus, in C++ you can do intrusive datastructures (not pretty, but works), which really wipe the floor with Java's structures. If you intend to store objects with lots of subobjects, this will bit you. As this wasn't bad enough java objects feel the need to store metadata, whereas C++ objects pretty much are what you declared them to be (the overhead comes from malloc, not from the language), unless you declared virtual member functions, in which case there's one pointer in there. In Java, it may (Sadly) be worth it to not have one object contain another, but rather copy all fields from the contained object into the parent object. You lose the benefits of typing (esp. since using an interface for this will eliminate your gains), but it does accelerate things by keeping both things together in memory.

3) Startup time. It's much improved in java 6, and again in java 7, but it's nowhere near C++ startup time.

4) Getting in and out of java is expensive. (Whereas in C++, jumping from one application into a .dll or a .so is about as expensive as a virtual method call)

5) Bounds checks. On every single non-primitive memory access at least one bounds check is done. This is insane. "int[5] a; a[3] = 2;" is 2 assembly instructions in C++, almost 20 in java. More importantly, it's one memory access in C++, it's 2 in java (and that's ignoring the fact that java writes type information into the object too, if that were counted, it'd be far worse). Java still hasn't picked up on Coq's tricks (you prove, mathematically, what the bounds of a loop variable are, then you try to prove the array is at least that big. If that succeeds -> no bounds checks).

6) Memory usage, in general. I believe this is mostly a consequence of 1) and 2), but in general java apps use a crapload more memory than their C++ equivalents (normal programs, written by normal programmers)

7) You can't do things like "mmap this file and return me an array of ComplicatedObject[]" instances.

But yes, in raw number performance, avoiding all the above problems, java does match C++. There actually are (contrived) cases where java will beat C++. Normal C++ that is. In C++ you can write self-modifying code that can do the same optimizations a JIT can do, and can ignore safety (after proving to yourself what you're doing is actually safe, of course).

Of course java has the big advantage of having fewer surprises. But over time I tend to work on programs making this evolution : python/perl/matlab/mathematica -> java -> C++. Each transition will yield at least a factor 2 difference in performance, often more. Surprisingly the "java" phase tends to be the phase where new features are implemented, cause you can't beat Java's refactoring tools.

Pyton/Mathematica have the advantage that you can express many algorithms as an expression chain, which is really, really fast to change. "Get the results from database query X", get out fields x,y, and z, compare with other array this-and-that, sort the result, and get me the grouped counts of field b, and graph me a histogram of the result -> 1 or 2 lines of (not very readable) code. When designing a new program from scratch, you wouldn't believe how much time this saves. IPython notebook FTW !


Hadoop and the latest version of Lucene come with alternative implementations of strings that avoid the UTF16 tax.

Second, I've seen companies fall behind the competition because they had a tangled up C++ codebase with 1.5 hour compiles and code nobody really understand.

The trouble I see with Python, Mathematica and such is that people end up with a bunch of twisty little scripts that all look alike, you get no code reuse, nobody can figure out how to use each other's scripts, etc.

I've been working on making my Java frameworks more fluent because I can write maintainable code in Java and skip the 80% of the work to get the last 20% of the way there with scripts..


Try c++11




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: