that noone other than marketing people take seriously
Plenty of people take it seriously. I take it seriously.
Most people scale out rather than scale up so it's pretty much irrelevant as it considers monolithic computing only.
Yet the top results are clusters.
You're 0 out of 2. Any more wisdom about TPC?
It's also worth considering audiences: If you're a web scale company holding recipes for millions of free accounts, you're a bit different from a hedge fund churning performance results for your investor results statements. Judging the latter from the perspective of the former is asinine.
People take TPC-C seriously because not much has come along that's more useful as a transactional benchmark. There are lots of transactional benchmarks, but they're often flawed in some annoying way and don't serve well as a baseline as TPC-C.
That said, TPC-C is horribly out-of-date. For example, you have to simulate human data entry time within transactions. How many systems have that problem today? It also only ever adds data, so if you run it fast enough, you have a petabyte problem, and most OLTP isn't a petabyte problem.
As for the second point, those clusters are clusters in name only. They use fancy and expensive interconnects and caches that effectively give them shared memory. Also, they won't tolerate failure of any individual component well. Finally, individual nodes still act as gatekeepers and transaction monitors. Most of the cluster is simply there to apply predicates on data coming off of disk really fast.
For example, you have to simulate human data entry time within transactions. How many systems have that problem today?
How many people have problems with concurrency? A shitload of people, that's who. This isn't a problem with no locking models because...no locking. It is when you care about consistency.
They use fancy and expensive interconnects and caches that effectively give them shared memory.
They often use high speed interconnects because the sort of customers who care about such build-outs would naturally use high speed interconnects. They are, however, clusters in every meaning of the word, regardless of no true scotsman fallacy's.
I'm very familiar with the TPC-C spec and the problem can't be brushed off with "concurrency!". There are multi-second waits in a large percentage of transactions. Nobody does this in any performant OLTP system today, but sure, concurrency! Except the benchmark limits how many of these transactions can be concurrently operating on a warehouse, one of its core models. So the only way to scale throughput is to add warehouses. You end up simulating a company with a million warehouses, each with a fairly small load. Furthermore, to run a million transactions a second, you'll need several million open transactions. That's why you see armies of client nodes in the spec of the systems on the leaderboards. The append-only data model is also difficult. If you removed waits and allowed old data to be pruned, the benchmark would be much more useful.
So yes, technically Oracle Exadata OLTP Clustering has multiple CPUs connected by an high speed interconnect. Cluster.
My dual socket commodity Dell server also has multiple CPUs connected by a high speed interconnect. Cluster?
My point was not to argue about a word, just that Exadata is not what some typically think of as a cluster in modern distributed systems. They've moved some smart filtering into a SAN and plugged that into RAC and shipped it all in a big hot tower-thingy. It's not bad in any way, it's just closer to SMP than other kinds of network-based parallelism.
I just meant it's not the same kind of clustering as Hadoop/HBase, Vertica, VoltDB, Cassandra, Riak, Greenplum, Netezza, Teradata or even DB2-Cluster.
If clustering is a spectrum where Dynamo-style systems like Riak are on one side and my Dell SMP system is the other extreme, the Sun cluster is probably closer to the SMP system than to the Riak cluster.
you really bought the marketing then. The code is always frigged for the benchmarks and you're not going to get anywhere if you try and replicate it yourself without half the vendor ops team being in site.
Basically, its packaged bullshit.
let me clarify monolithic: its a single component in the architecture, not the full stack. I mean if I put an aggressive caching layer over the top I don't even need to hit the cluster for consistent reads.
Regarding my utilisation and experience, we're high finance (private insurance, product sales, portfolio management), 5k concurrent users and have been around 20 years. We've also been screwed by every vendor under the sun on bullshit like that over the years.
I mean if I put an aggressive caching layer over the top I don't even need to hit the cluster for consistent reads.
So the database vendors would likely do well to implement aggressive caching right at the source, right? That would be why our database instances use the 192GB of the machine that they're on.
There is no silver bullet to platforms. Further the database is usually the most restrictive layer of the stack, hence why the focus on that (but consult Redis or nginx or tomcat benchmarks if you are looking at the other layers).
Nope - databases cache chunks of ready to serve but not yet processed data, not precalculated business logic outputs and resolved rules.
Every time you hit them, they cost cycles, IO (the real killer) and RAM.
The database is very restrictive which is why our domain model lives outside it.
We put stuff in the OLTP store when we're done with it and take it out when we need it, or most likely from our ORM L2 cache and/or our service layer cache.
DB does f-all apart from keep our caches hydrated.
Its bad architecture to put all your eggs in a black box, particularly an expensive one.
Are you, perchance, a disciple at the church of ORM? That is a religion that makes its own problems, and then celebrates the victory when they solve their own problems.
The database is very restrictive
No it isn't. The claim is ridiculous.
Its bad architecture to put all your eggs in a black box
It's not a church or religion. It's where you find technology's athiests (engineers) solving business domain problems rather than working out how best to represent them in a relational way and bend them to get what they want.
I feel you are heavily politicised by the vendors rather than the solutions required.
Plenty of people take it seriously. I take it seriously.
Most people scale out rather than scale up so it's pretty much irrelevant as it considers monolithic computing only.
Yet the top results are clusters.
You're 0 out of 2. Any more wisdom about TPC?
It's also worth considering audiences: If you're a web scale company holding recipes for millions of free accounts, you're a bit different from a hedge fund churning performance results for your investor results statements. Judging the latter from the perspective of the former is asinine.