Why are Oracle and DB2 still on top of TPC-C? A roadmap to end their dominance.

peapicker · on May 16, 2012

"How is technology designed decades ago still dominating TPC-C?" -- this assumption does not take into account that the engine underneath the technology interface design can change and evolve for new technologies to some extent.

Additionally, the reason these two are at the top of the heap is that Oracle and IBM both put a great deal of money not only into the software, but into designing and producing the machines that run these tests, the filesystems and operating systems, etc -- it is a total package with a driving focus related to database performance; these tests are run on multi-million dollar machines made by those companies.

Until you have a complete end-to-end ecosystem, with a focus on everything from the hardware, through the OS, all the way to the DB, you probably aren't going to beat them with any 'new' technology...

If a 'new technology' does beat out the traditional, I will expect the products that will be at the top of the performance curve, for the near future (10yrs) to also come from IBM and Oracle... Especially since Oracle has already release an ACID noSQL db engine that is also tightly wedded to their full ecosystem. (IBM may have as well, but I didn't notice if they did)

jhugg · on May 16, 2012

It is, surely, difficult to create a product that is better than Oracle/DB2 in every dimension. These systems have thousands of engineer-years of work baked into them.

On the other hand, it's pretty easy to beat these systems by specializing. Take some part of what they do and do it better. [Insert sports car / minivan / mack truck analogy here]

The interesting question is whether the pain of using the new system is less than the pain of using the old system. New system pain often comes from having fewer features or a less robust implementation. Legacy pain often comes from managing software that has to maintain compatibility with 20 year old apps and scaling workloads on software designed when 8mb of ram was a lot and clusters weren't practical.

I will also say from experience, if you take a copy of Oracle and the same hardware from the TPC-C leader board, you will have a hell of a time replicating their results. They use every trick in the book and spend huge sums of money tuning for these benchmarks (n.b. I don't fault them). In practice getting throughput close to what they claim on actual real-world apps is not realistic.

Now, you should take all benchmarks with a grain of salt (even mine). Ask a vendor how fast the system will run on a real workload that you understand with the configuration you plan to deploy with. If their number is attractive, build a POC and check for yourself.

stephen · on May 16, 2012

Sounds like event sourcing [1] applied to databases.

The server-side transactions reminded me a lot of VoltDB [2], which also has server-side transactions and turns out was a previous system Abadi was involved in.

My naive impression is that VoltDB is more about being in-memory, single-threaded, and lock-free, where as this Calvin approach is more about deterministic ordering of events.

Sounds really cool, but there is a reason people don't like writing stored procedures. Besides (historically) meaning you're forced into a non-general programming language, it means you can't do anything else while executing the logic (call out to a 3rd party, read a file from somewhere else, etc.).

While SQL has its flaws, the semantics of writing your application logic freely over multiple queries ("make a query, do some work, make a query, do some work, ... commit!") and having the database just make that magically work for you is pretty darn nice and not something I'd easily give up.

Maybe a heavy dose of optimistic locking and a pleasant/mostly automated process of coding/integrating/deploying the stored procedures would make it possible to get all the performance benefits of their approach. It does sound pretty smart.

[1]: http://martinfowler.com/eaaDev/EventSourcing.html

[2]: http://voltdb.com/

zzzeek · on May 16, 2012

It does say it supports ad-hoc Python blocks. I can see the potential for Python frameworks/ORMs being able to "export" a series of instructions over to such a system, or alternatively allow the object-relational/database abstraction layer to just run on the server side to start with. If you can produce a DBAPI-like API on the server, then ORMs like SQLAlchemy and maybe others should be fully usable within that environment. Client-server communication might use some kind of RPC-like system. Depending on how flexible the Python environment is, the client system could send over blocks of Python code on the fly which gets cached, making the system almost transparent.

SpikeGronim · on May 16, 2012

Vendors rig these benchmarks to death. An ex-coworker of mine worked on a major proprietary SQL db. He told me that they spent 6 months with both software and hardware vendor engineers optimizing for the benchmark. They put TPC specific code in the query planner!

tlogan · on May 16, 2012

This might useful for database as a service offerings. If that trend pick up the steam.

BTW, I think Oracle and DB2 are still on top of TPC-C is because you really don't need any faster database for OLTP than they are now. The maximum number of customers any online shop will not exceed 4 billion... The need for faster OLTP is just not there.

Warehousing and big data is a different beast... Oracle did not invest to make faster OLTP but faster OLAP...

jhugg · on May 16, 2012

If you count OLTP as "a system that places orders created by people", then you might be right.

As a VoltDB eng, we see lots of OLTP workloads that require much more throughput than legacy RDBMs can offer on a single system.

How many transactions do you think it requires to pick which ads to show you on a webpage?

How many updates to your Farmville-style game state are required per minute? How many concurrent users? What about MMOs?

Are financial exchange orders OLTP? If so, you have a six-figure TPS problem.

What about sensor data? Network packet monitoring? Call data record dashboards?

tlogan · on May 16, 2012

Yes there is a market but it is not very big. There is only one Zynga.

Also note that some of examples you mention are very well solved by streaming databases (Streambase) and event processing systems (which are sold as add-on to databases).

jhugg · on May 16, 2012

A big part of my job is talking to people who have scale pain with legacy systems. I don't know what percentage of the DB market this is, but it's nontrivial and growing fast.

Most of the markets I mentioned in the previous comment are nearly impossible to be successful in with a single node of legacy RDBMS sitting behind your app. Zynga is far from alone in social gaming scale pain.

Consider digital ad-tech. How many ads do you have to show before someone clicks on one? How many clicks do you need to earn $1? That can translate into: The cost of all those DB operations needs to cost way less than $1 or I'm hosed. Enter systems that can scale with less pain.

Streambase is a good example of a specialized system that can outperform legacy RDBMSs. Still, it's not like you can say, "All financial problems in scale pain can fix everything with Streambase." It's too specialized. What if you need 100gb of state? There are lots of problems in finance and some of them can be solved with Oracle/DB2 while others can't.

spitfire · on May 16, 2012

I think that's the point. The idea that you need a different database for OLTP vs OLAP, vs whatever is a technological one. Not a conceptual one. Operationally, if you had the freedom you'd ideally like to have just one database that can handle everything, storing data in appropriately accessible formats (relational, time-series. key-value, document, etc).

This is getting commodity open source databases up to that level of "don't care, it's fast enough" for OLTP. Then on to all the other workloads. Think of it like the move from C/C++ to dynamic languages like python or lisp as cpu's got faster.

Cool beans.

gouranga · on May 16, 2012

Two reasons:

1. TPC-C is a vendor sponsored cock measuring competition that noone other than marketing people take seriously.

2. Most people scale out rather than scale up so it's pretty much irrelevant as it considers monolithic computing only.

TPC-C is from the dark ages...

gaius · on May 16, 2012

He mentions that "scale-out" is much harder when you have atomic transactions spanning more than one "shard".

gouranga · on May 16, 2012

Harder yes, but not impossible.

gaius · on May 16, 2012

No, not impossible, but all those they tested performed worse than classic RDBMS. Locking turns out to be a hard problem...

gouranga · on May 17, 2012

Don't lock then. You can have transactions through the reservation pattern and operation ordering without locking.

huggyface · on May 16, 2012

that noone other than marketing people take seriously

Plenty of people take it seriously. I take it seriously.

Most people scale out rather than scale up so it's pretty much irrelevant as it considers monolithic computing only.

Yet the top results are clusters.

You're 0 out of 2. Any more wisdom about TPC?

It's also worth considering audiences: If you're a web scale company holding recipes for millions of free accounts, you're a bit different from a hedge fund churning performance results for your investor results statements. Judging the latter from the perspective of the former is asinine.

jhugg · on May 16, 2012

People take TPC-C seriously because not much has come along that's more useful as a transactional benchmark. There are lots of transactional benchmarks, but they're often flawed in some annoying way and don't serve well as a baseline as TPC-C.

That said, TPC-C is horribly out-of-date. For example, you have to simulate human data entry time within transactions. How many systems have that problem today? It also only ever adds data, so if you run it fast enough, you have a petabyte problem, and most OLTP isn't a petabyte problem.

As for the second point, those clusters are clusters in name only. They use fancy and expensive interconnects and caches that effectively give them shared memory. Also, they won't tolerate failure of any individual component well. Finally, individual nodes still act as gatekeepers and transaction monitors. Most of the cluster is simply there to apply predicates on data coming off of disk really fast.

huggyface · on May 17, 2012

For example, you have to simulate human data entry time within transactions. How many systems have that problem today?

How many people have problems with concurrency? A shitload of people, that's who. This isn't a problem with no locking models because...no locking. It is when you care about consistency.

They use fancy and expensive interconnects and caches that effectively give them shared memory.

They often use high speed interconnects because the sort of customers who care about such build-outs would naturally use high speed interconnects. They are, however, clusters in every meaning of the word, regardless of no true scotsman fallacy's.

jhugg · on May 17, 2012

I'm very familiar with the TPC-C spec and the problem can't be brushed off with "concurrency!". There are multi-second waits in a large percentage of transactions. Nobody does this in any performant OLTP system today, but sure, concurrency! Except the benchmark limits how many of these transactions can be concurrently operating on a warehouse, one of its core models. So the only way to scale throughput is to add warehouses. You end up simulating a company with a million warehouses, each with a fairly small load. Furthermore, to run a million transactions a second, you'll need several million open transactions. That's why you see armies of client nodes in the spec of the systems on the leaderboards. The append-only data model is also difficult. If you removed waits and allowed old data to be pruned, the benchmark would be much more useful.

So yes, technically Oracle Exadata OLTP Clustering has multiple CPUs connected by an high speed interconnect. Cluster.

My dual socket commodity Dell server also has multiple CPUs connected by a high speed interconnect. Cluster?

My point was not to argue about a word, just that Exadata is not what some typically think of as a cluster in modern distributed systems. They've moved some smart filtering into a SAN and plugged that into RAC and shipped it all in a big hot tower-thingy. It's not bad in any way, it's just closer to SMP than other kinds of network-based parallelism.

huggyface · on May 17, 2012

My dual socket commodity Dell server also has multiple CPUs connected by a high speed interconnect. Cluster?

The top TPC-C cluster has 27 database servers, each with 4 processors and 512GB of RAM. Can your Dell server host 108 processors and 13.8TB of RAM?

The Sun cluster is exactly what most people think of as a "cluster" -- a scalable set of servers that vastly exceeds single server performance.

jhugg · on May 17, 2012

Yes, it's a cluster. I totally concede.

I just meant it's not the same kind of clustering as Hadoop/HBase, Vertica, VoltDB, Cassandra, Riak, Greenplum, Netezza, Teradata or even DB2-Cluster.

If clustering is a spectrum where Dynamo-style systems like Riak are on one side and my Dell SMP system is the other extreme, the Sun cluster is probably closer to the SMP system than to the Riak cluster.

gouranga · on May 16, 2012

you really bought the marketing then. The code is always frigged for the benchmarks and you're not going to get anywhere if you try and replicate it yourself without half the vendor ops team being in site.

Basically, its packaged bullshit.

let me clarify monolithic: its a single component in the architecture, not the full stack. I mean if I put an aggressive caching layer over the top I don't even need to hit the cluster for consistent reads.

Regarding my utilisation and experience, we're high finance (private insurance, product sales, portfolio management), 5k concurrent users and have been around 20 years. We've also been screwed by every vendor under the sun on bullshit like that over the years.

End game: CQRS. Google it.

huggyface · on May 16, 2012

I mean if I put an aggressive caching layer over the top I don't even need to hit the cluster for consistent reads.

So the database vendors would likely do well to implement aggressive caching right at the source, right? That would be why our database instances use the 192GB of the machine that they're on.

There is no silver bullet to platforms. Further the database is usually the most restrictive layer of the stack, hence why the focus on that (but consult Redis or nginx or tomcat benchmarks if you are looking at the other layers).

gouranga · on May 16, 2012

Nope - databases cache chunks of ready to serve but not yet processed data, not precalculated business logic outputs and resolved rules.

Every time you hit them, they cost cycles, IO (the real killer) and RAM.

The database is very restrictive which is why our domain model lives outside it.

We put stuff in the OLTP store when we're done with it and take it out when we need it, or most likely from our ORM L2 cache and/or our service layer cache.

DB does f-all apart from keep our caches hydrated.

Its bad architecture to put all your eggs in a black box, particularly an expensive one.

gaius · on May 17, 2012

What you say is true... In the 90s. Let's flip this around and imagine I was lecturing you on why forking CGI scripts "don't scale".

huggyface · on May 17, 2012

Business logic outputs? Resolved rules?

Are you, perchance, a disciple at the church of ORM? That is a religion that makes its own problems, and then celebrates the victory when they solve their own problems.

The database is very restrictive

No it isn't. The claim is ridiculous.

Its bad architecture to put all your eggs in a black box

Nonsensical.

gouranga · on May 17, 2012

It's not a church or religion. It's where you find technology's athiests (engineers) solving business domain problems rather than working out how best to represent them in a relational way and bend them to get what they want.

I feel you are heavily politicised by the vendors rather than the solutions required.

gaius · on May 16, 2012

Have you not just reinvented CICS?

JoachimSchipper · on May 16, 2012

If you mean http://en.wikipedia.org/wiki/CICS, that does look similar. From a quick look at the Wikipedia page, it does not appear to be a very distributed system, though.

noselasd · on May 16, 2012

You can scale it pretty amazingly, though only on homogeneous IBM mainframes.

ExpiredLink · on May 16, 2012

sort of

> Calvin is not a database system itself, but rather a transaction scheduling and replication coordination service. We designed the system to integrate with any data storage layer, relational or otherwise.

spitfire · on May 16, 2012

I haven't read the full paper yet. But this sounds like a winner to me. It's actually not a database, but a transaction scheduling system sitting in front of the databases.

Hopefully these sort of ideas make it into Postgres for scalability.

sixbrx · on May 16, 2012

Looks very interesting.

The "limitations" are especially interesting:

"Calvin’s primary limitation compared to other systems is that transactions must be executed entirely server-side. Calvin has to know in advance what code will be executed for a given transaction."

Also the author mentions in the comments that certain non-deterministic functions such as fetching random numbers of current date/time will not be allowed within the server-side transactions, the client will have to pass such values to the server.

jacquesm · on May 16, 2012

Could someone please explain the sentence at the bottom of page3 of the pdf? It reads:

"This decoupling makes it impossible to implement certain popular recovery and concurrency control techniques such as the physiological logging in ARIES and next-key locking to handle phantoms (i.e. using physical surrogates for logical properties in concurrency control)."

Also: the pre-fetch trick where the read request is sent to the storage layer pre-emptively with an artificial delay for all dependent operations is clever but could be derailed quite a bit when a drive re-calibrates. That can take a large multiple of the time a seek typically takes (which is the case you'd be optimizing for here).

kyberias · on May 16, 2012

Can someone explain why the IBM machines on the list are running AIX operating system and Microsoft's COM+? I thought COM+ was Windows thing.

DrJokepu · on May 17, 2012

This answer on Stackoverflow explains it: http://stackoverflow.com/a/401999/8954

kyberias · on May 17, 2012

Sorry, no, it doesn't explain it at all! I already know COM+ is delivered on Windows 2000 and above. But the TPC benchmark table talks about AIX, which is a Unix, not Windows! Why do they say AIX, not Windows if they're running COM+?

DrJokepu · on May 17, 2012

"In fact, it's usually used as the TP monitor in TPC-C benchmark systems because it's more efficient than .Net or Java and much cheaper than Tuxedo or Encina (which reduces the $/TPM)."

kyberias · on May 18, 2012

So are they running Windows boxes but it's nowhere to be seen? Only AIX is mentioned.