Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Never heard of CedarDB.

Seems to be another commercial cloud-hosted thing offering a Postgres API? https://dbdb.io/db/cedardb

https://cedardb.com/blog/ode_to_postgres/





I was evaluating it recently but it's not FOSS, so buyer beware. I'm totally fine with commercialization, but I hesitate to build on top of data stores with no escape hatches or maintenance plans–especially when they're venture backed. It is self-hostable, but not OSS.

It's a startup founded by -- and built with tech coming out of research by -- some well known people in the DB research community.

Successor to Umbra, I believe.

I know somebody (quite talented) working there. It's likely to kick ass in terms of performance.

But it's hard to get people to pay for a DB these days.


It's probably going to be acquired. The last effort to commercialize the TUM (Technical University of Munich) database group's work was acquired by Snowflake and disappeared into that stack.

CedarDB is the commercialization of Umbra, the TUM group's in-memory database lead by professor Thomas Neumann. Umbra is a successor to HyPer, so this is the third generation of the system Neumann came up with.

Umbra/CedarDB isn't a completely new way of doing database stuff, but basically a combination of several things that rearchitect the query engine from the ground up for modern systems: A query compiler that generates native code, a buffer pool manager optimized for multi core, push-based DAG execution that divides work into batches ("morsels"), and in-memory Adaptive Radix Tries (never used in a database before, I think).

It also has an advanced query planner that embraces the latest theoretical advances in query optimization, especially some techniques to unnest complex multi-join query plans, especially with queries that have a ton of joins. The TUM group has published some great papers on this.


> It also has an advanced query planner that embraces the latest theoretical advances in query optimization, especially some techniques to unnest complex multi-join query plans, especially with queries that have a ton of joins. The TUM group has published some great papers on this.

I always wondered how good these planners are in practice. The Neumann/Moerkotte papers are top notch (I've implemented several of them myself), but a planner is much more than its theoretical capabilities; you need so much tweaking and tuning to make anything work well, especially in the cost model. Does anyone have any Umbra experience and can say how well it works for things that are not DBT-3?


Umbra is not an in-memory database (Hyper was). TUM gave up on the feasibility of in-memory databases several years ago (when the price of RAM relative to storage stopped falling).

Thanks for the correction. My understanding was that it was still in-memory but "fell back on" disk. ART indexes were touted as one of the novel aspects of Umbra, and my understanding is that ART doesn't work well as an on-disk data structure, so I guess I need to read up on the architecture now.

No, again, ART was Hyper's specialty. Because you're right, ART specializes at in-memory workloads it is not amenable to paging.

I believe Umbra is heavily BTree based, just like its cousin LeanStore.

One of its specific innovations is its buffer pool which uses virtual memory overcommit and multiple possible buffer sizes to squeeze better performance out of page management.

The talk at https://www.youtube.com/watch?v=pS2_AJNIxzU is delightful.

My understanding is the research projects LeanStore & Umbra -- and now I assume the product CedarDB based on the people involved, etc. -- are systems based on the observation that a) existing on-disk systems aren't built well with the characteristics of nVME/SSD drives in mind b) RAM prices up to this year were not dropping at the same rate as they were early in the 2010s, meaning that pure in-memory databases were not so competitive, so it's important to look at how we can squeeze performance out of systems that perform paging. And of course in the last 6 months this has become extremely relevant with the massive spike in RAM prices.

That and the query compilation stuff, I guess, which I know less about.


Thanks for the corrections and info. Will check out that video. I could have sworn I had read these things about Umbra, but I suppose it was HyPer. Both interesting designs!

I am currently writing an on-disk B+tree implementation inspired by LMDB's blazing fast memory-mapped CoW approach, so I'm quite keen to learn what makes TUM's stuff fast.


Take a look at the opensource leanstore repository.

https://github.com/leanstore/leanstore

Very different approach from lmdb's mmap though. Not limited to single writer. Explicit buffer pool. Big difference from Umbra is fixed sized pages.


Yeah I think the way Umbra was pitched when I watched the talks and read the paper was as more as "hybrid" in the sense that it aimed for something close to in-memory performance while optimizing the page-in/page-out performance profile.

The part of Umbra I found interesting was the buffer pool, so that's where focused most of my attention when reading though.


Are you thinking of Hyper being acquired by Tableau?

My bad. HyPer was acquired by Tableau, which was acquired by Salesforce.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: