Workers Durable Objects Beta: A New Approach to Stateful Serverless

_ahs0 · on Sept 28, 2020

Demo chat application from the article using Durable Objects:

- https://edge-chat-demo.cloudflareworkers.com

- A Public Room (to joint test):

hackernews

- Source: https://github.com/cloudflare/workers-chat-demo

Insanely awesome feature add (much needed for truly “serverless” application development). The power to scale here without insane infrastructure headache is amazing.

One day some kid is totally going to build a single-person billion dollar company from his mom’s basement.

slashdev · on Sept 28, 2020

That already happened, see plentyoffish.com, created by Markus Frind in Vancouver, the first (or first popular anyway) free online dating website. Sold for $575M. No outside funding, and for a long time it was just him and his girlfriend running it, working about 10 hours a week. When he sold it to match.com, they had 75 employees and he was working normal hours.

grinich · on Sept 28, 2020

*or her’s

catern · on Sept 28, 2020

Wow, very cool! I didn't see the string "mobile code" in this press release, but that's essentially what this is, right? Automatically moving objects to be near the computation that needs it, is a long-standing dream. It's awesome to see that Cloudflare is giving it a try! Plus, the persistence is clever - I'm guessing that makes the semantics of mobility much easier to deal with.

I love the migration to nearby edge nodes, but here's a question to any Cloudflare employees around: Have you given any thought to automatically migrating Durable Objects to end user devices?

That has security implications of course, so if you've dismissed the idea previously because the security issues are too hard to surface to the developer, that's reasonable.

kentonv · on Sept 28, 2020

> Have you given any thought to automatically migrating Durable Objects to end user devices?

We don't have any current plans, but... I was the co-founder of Sandstorm.io before going to Cloudflare, and Durable Objects are very much inspired by parts of Sandstorm's design. So yeah, I've absolutely thought about it. ;)

It would definitely have to be an opt-in thing on the developer's part, due to the security considerations as you mention. But I think the possibilities for solving tricky compliance problems are pretty interesting.

Protip: "Compliance" is how you say "privacy" while sounding like a shrewd business person instead of an activist. ;)

treis · on Sept 28, 2020

>Automatically moving objects to be near the computation that needs it, is a long-standing dream. It's awesome to see that Cloudflare is giving it a try!

I'm not sure I see many real world applications for this. It seems to sit in the unhappy middle ground between local device storage and central storage. Local storage is the best performance because you eliminate network issues but then you have to deal with sync/consistency issues. Central storage & processing eliminates sync/consistency issues but can have poor performance due to network. Worker Durable Objects sits in the middle. You trade consistency complications for performance but instead of eliminating the network you're shaving some tens of miliseconds off the RTT. It's a level of performance improvement that essentially no one will notice.

To use their examples:

>Shopping cart: An online storefront could track a user's shopping cart in an object. The rest of the storefront could be served as a fully static web site. Cloudflare will automatically host the cart object close to the end user, minimizing latency.

>Game server: A multiplayer game could track the state of a match in an object, hosted on the edge close to the players.

>IoT coordination: Devices within a family's house could coordinate through an object, avoiding the need to talk to distant servers.

>Social feeds: Each user could have a Durable Object that aggregates their subscriptions.

>Comment/chat widgets: A web site that is otherwise static content can add a comment widget or even a live chat widget on individual articles. Each article would use a separate Durable Object to coordinate. This way the origin server can focus on static content only.

The performance benefits for the cart, social feed, and chat are irrelevant. Nobody cares if it takes 50 ms longer for any of those things.

IoT coordination is more promising because you want things to happen instantly. Maybe it's worth it here, but people usually have a device on their local network to coordinate these things.

Game server would definitely be an improvement. But these things are more complex than some JS functions and it would be a large effort to make them work with Durable Objects.

kentonv · on Sept 28, 2020

> The performance benefits for the cart, social feed, and chat are irrelevant. Nobody cares if it takes 50 ms longer for any of those things.

I think this is missing a few points:

1. Yeah they do. If your shopping cart responds 50ms faster when someone clicks "add to cart", you will see a measurable benefit in revenue.

2. It's actually a lot more than 50ms. A chat app built on a traditional database -- in which a message arriving from one user is stored to the database, and other users have to poll for that message -- will have, at best, seconds of latency, and even that comes at great expense (from polling). The benefit from Durable Objects is not just being at the edge but also being a live coordination point at which messages can be rebroadcast without going through a storage layer.

3. Yes, some databases have built-in pub/sub that avoids this problem and may even be reasonably fast, but using Durable Objects is actually much easier and more flexible than using those databases.

treis · on Sept 28, 2020

(1) If 50 ms is that important then the cart should be stored locally and synced in the background. That's my broader point. Performance sensitive things should use local storage. Things that are not should use the convenience of a central server.

(2) Nobody builds chat apps that way. The apples to apples comparison would be something using websockets and Redis. The only savings I see there are the time saved by the server being physically closer.

kentonv · on Sept 28, 2020

> Nobody builds chat apps that way.

Of course they don't.

> The apples to apples comparison would be something using websockets and Redis.

Which would be way more complicated to write, deploy, and maintain, and scale than this little 200-line Durable Objects chat demo...

The real point here isn't performance, it's simplicity. But also not having to trade away performance to get simplicity is nice.

nl · on Sept 29, 2020

> The only savings I see there are the time saved by the server being physically closer.

Here in Australia, with ping times to US West Coast (where lots of companies host by default) of 170ms this is a real issue.

catern · on Sept 28, 2020

>It seems to sit in the unhappy middle ground between local device storage and central storage.

No, it's strictly better than both. You get the performance of local storage and the ease of programming of central storage.

Pretty much everyone chooses central storage at the moment, so the advantage of mobile objects manifests as performance.

I don't know about you, but I would find it a much nicer experience if when I clicked "Add Item" in a shopping cart on some website, it happened <5ms regardless of the quality of my network connection, while still being shared between different machines and never encountering consistency issues. The current "wait somewhere from half a second to several seconds for each click" is bad UX, even if users have gotten used to it.

treis · on Sept 28, 2020

>You get the performance of local storage

Durable workers are still a network call over the internet. That is orders of magnitudes slower than using local storage in the browser.

>when I clicked "Add Item" in a shopping cart on some website, it happened <5ms

You're not going to get that level of performance. All edge storage does is saves you the time it takes a packet to go from the edge to the server. For example, I'm in Atlanta and the Hacker news server is in San Diego. This is my traceroute:

1 LEDE.lan (192.168.1.1) 12.652 ms 12.596 ms 12.561 ms

2 96.120.5.9 (96.120.5.9) 32.606 ms 32.604 ms 32.581 ms

3 68.85.68.85 (68.85.68.85) 34.831 ms 34.821 ms 34.792 ms

4 96.108.116.41 (96.108.116.41) 34.762 ms 34.730 ms 34.706 ms

5 ae-9.edge4.Atlanta2.Level3.net (4.68.38.113) 39.613 ms 64.603 ms 57.149 ms

6 ae-0-11.bar1.SanDiego1.Level3.net (4.69.146.65) 89.310 ms 66.298 ms 76.021 ms

7 M5-HOSTING.bar1.SanDiego1.Level3.net (4.16.110.170) 75.982 ms 69.094 ms 79.199 ms

The last hop in Atlanta before hitting the backbone has a round trip time of ~35ms and the first in San Diego is ~75ms. So everything else equal, if HN was served from a edge location I'd save 40ms on a page load. Ultimately 40 ms doesn't matter because it's not something that the end user can perceive.

kentonv · on Sept 28, 2020

> Durable workers are still a network call over the internet.

Getting the request to the object involves traversing the internet. Once there, actually talking to storage is extremely fast compared to classical monolithic databases. The key is that application code gets to run directly at the storage location.

Most applications need to do multiple round trips to storage to serve any particular request, which is where the costs add up.

> Ultimately 40 ms doesn't matter because it's not something that the end user can perceive.

In ~2005 when I started at Google, I learned that they had done a study that found that every millisecond of latency shaved off the time it took to serve search results would add $1M in annual revenue.

Users may not perceive 40ms in isolation, but they do perceive a web site "feeling" slower if every request takes 40ms longer.

eastdakota · on Sept 28, 2020

Regulatory compliance is the killer feature.

treis · on Sept 28, 2020

Maybe, but not as they are. If I understand it correctly there's no limits by geography. If the Durable Object is created in country A when a worker from Country B accesses it then that data will be replicated to the worker in Country B.

kentonv · on Sept 28, 2020

No, that's not correct. The worker in country B will end up sending a message to the Durable Object which will still be located in Country A. An object only exists in one place at a time.

We are working on automatic migration, where if we noticed the object was more-frequently accessed in country B than in country A, then it gets moved to country B. But, that's a performance optimization, and it will be straightforward to implement policies on top of that to restrict certain objects to migrate only within certain political regions.

jacques_chester · on Sept 28, 2020

> I'm going to be honest: naming this product was hard, because it's not quite like any other cloud technology that is widely-used today.

On a superficial skim it looks like a tuple space; they were heavily researched in the 80s and 90s. JavaSpaces emerged in the late 90s but never took off.

Scala folks are keen on Actor models (Lightbend have been using the term "Stateful Serverless" for a while now), as are Erlang and Elixir folks.

I guess the key here is "widely-used".

Edit: this sounds even more arrogant than I intended. Sorry. I just feel bad for tuple space researchers (including my Honours supervisor). They laboured mightily in the 80s and 90s and their reward was to be largely ignored by industry.

rektide · on Sept 28, 2020

It sounds fairly Actor-like to me. There's a bunch of different entities, each is a singular entity that lives somewhere & has it's own state that only it can directly access. These Actors happens to be mobile in Durable Objects. And they are presented more object-like than actor-like, but that seems like a different in name more than difference in nature to me.

Edit: oh, here's @kentonv, capnproto author & cloudflare employee, elsewhere in this discussion:

> Each object is essentially an Actor in the Actor Model sense. It can send messages (fetches, and responses to fetches) to other objects and regular workers. Incoming requests are not blocked while waiting for previous events to complete.

https://news.ycombinator.com/item?id=24617172

gavinray · on Sept 28, 2020

I have been trying to remember the world "tuple space" for several months, after running across it once and then trying to describe it to a friend.

Thank you, it was bugging me so much.

jopsen · on Sept 28, 2020

What is the size limits for a durable object?

The read/write limit per second?

That usually the first things I want to know about my cloud primitives...

(Credits for at-least being clear about consistency which is always my very first question)

kentonv · on Sept 28, 2020

Well, this is kind of like asking the throughput of an individual Worker instance. It doesn't really matter, because the system automatically spins up as many as you need, and so the overall throughput is effectively unlimited.

For Durable Objects, applications should aim to make their objects as fine-grained as they reasonably can, so that the limits on one object are not likely to matter. Meanwhile, the total capacity across all objects is effectively unlimited.

Anyway, we don't have numbers for these questions yet. This is an early beta and we still have a lot of low-hanging fruit optimization to do.

chris_st · on Sept 29, 2020

But isn't each (say) chat room a single object, and each is single-threaded (per answer elsewhere on this page).

It's nice to know that if "N" chat rooms get started, "N" instances are built, but if 100k people join one chat room, it's going to bog at best, and flame out at worst. Or am I guessing wrong?

kentonv · on Sept 30, 2020

Consider that a human can probably only keep up with, at most, a couple chat messages per second (and that's generous). But a Durable Object can handle orders of magnitude more than that. So the scalability bottleneck for the chat use case is actually humans, not the limits of a Durable Object.

Many use cases end up being this way.

With that said, if you really wanted to support a chat room that has more traffic than a single object can handle, the way to do it would be to shard the object. E.g. have 10 chat room "shards", have each client connect to all 10, and randomly choose a shard to handle each message sent. Or if it's the number of clients (not the frequency of messages) that is a limiting factor, have each client connect to only one shard, but have the shards rebroadcast to each other. (This is two possible designs, but there are many other options depending on what you're trying to accomplish.)

If you're getting near the limit of what one Durable Object can handle, you probably should introduce such sharding into your app. Once you have sharding, you can scale easily again, by adding shards.

We'll likely add tools so that the system can handle various sharding approaches automatically for you.

jopsen · on Sept 30, 2020

In practice splitting things into fine-grained objects with async messaging is hard. Well, it depends...

How fine-grained?? Which is essentially the question I'm asking.

Would I need a hierarchy of durable objects to store a 100kb collaborative text document? What about 10MB?

You can make things very fine-grained, if you want to.. but at some point you're essentially implementing distributed consensus algorithms -- and then it might be better to just use a single point of failure like a database..

kentonv · on Sept 30, 2020

> Would I need a hierarchy of durable objects to store a 100kb collaborative text document? What about 10MB?

The bottleneck is going to be more CPU than memory or storage size. So the question is, how many users are editing, what rate of events does each generate, and how complex is the event handler?

Let's say it's actually a plaintext editor, with the 10MB text file represented in memory using a reasonable data structure allowing O(1) insertions and deletions. Clients send a stream of keystrokes to the server. Server writes document out to disk periodically, not on every keystroke. Then I would expect each keystroke to take less than 1ms of CPU time to process, therefore at least 1000 per second could be processed by one thread. Let's say people type 10 keystrokes per second, then you could have 100 users actively typing at once? This is just my intuition, though.

> and then it might be better to just use a single point of failure like a database..

Incidentally databases will also hit scaling bottlenecks if you have too many requests hitting the same row. Under the hood, the database has to do exactly what Durable Objects do -- the row will be owned by one "chunk" which has to serialize all changes (making it effectively single-threaded).

So "use a database" doesn't necessarily solve your scaling bottleneck. In fact, it's likely to be worse, since the database chunk is not running app-specific optimized code.

jopsen · on Oct 2, 2020

> So "use a database" doesn't necessarily solve your scaling bottleneck.

Absolutely, it won't :D

I'm just saying that if I do a postgres database on say heroku, I have a clue what it can handle. I get the specs, it has this much RAM, this many CPUs and this much storage.

I also know that I'll be the only user of said server.

With other hosted services like S3, dynamodb, azure tables, etc, the documentation features "scalability targets".

S3 I can see how may reads / s a bucket can handle, and how it scales up. On azure tables I know a table can handle 100k writes / s (or so).

If I send a lot of messages to a single durable object, will it scale to the point where it gets a dedicated node? (what is the approximate definition of node) Will it migrate automatically? (what temporary degration might I see).

Can I have a 1GB durable object with 1 transaction every 1 hour? What about a 100 GB durable object with a 1 transaction every day?

Can I have a 1MB durable object with 10 messages per minute?

Or is it measured in compute seconds, clock cycles?

Liek can I have 10kb durable object with 10 messages / s each using 0.5ms CPU time?

Similarly, how does it look when I increase CPU time per message, number of messages or size of the object, where are the limits?

From reading the documentation one might think one could store a 100GB git repository as a durable object. Sure each commit takes time, but I have few commits / hour.

(I assume a 100GB durable object won't work, but I can see any limits or scalability targets in the documentation)

Just to clarify, I don't expect an answer to the question above, I expect documentation to feature some "scalability targets" and "limits". Something that gives me an intuition about what a durable object can handle. - So I know whether to shard my use-case or not :D

On topic: I think durable objects is really cool, message passing seems like the right model for scalable cloud computing.

toddh · on Sept 28, 2020

Interesting, i didn't see how security works? Is there backpressure on message senders? Any ordering guarantees? Are messages queued so activated objects can reconstruct state? Can passivation warmth be controlled? Can objects support multiple threads? Can objects move? Failover?

kentonv · on Sept 28, 2020

Great questions.

> how security works?

Messages can only be sent to Durable Objects from other Workers. To send a message, you must configure the sending Worker with a "Durable Object Namespace Binding". Currently, we only permit workers on the same account to bind to a namespace. Without the binding, there's no way to talk to Durable Objects in that namespace.

> Is there backpressure on message senders?

Currently, the only message type is HTTP (including WebSocket). There is indeed backpressure on the HTTP request/response bodies and WebSocket streams.

In fact, this is exactly why we added streaming flow control to Cap'n Proto: https://capnproto.org/news/2020-04-23-capnproto-0.8.html

We plan to support other formats for messaging in the future.

> Any ordering guarantees?

Since each object is single-threaded, any block of code that doesn't contain an `await` statement is guaranteed to execute atomically. Any put()s to durable storage will be ordered according to when put() was invoked (even though it's an async method that you have to `await`.)

When sending messages to a Durable Object, two messages sent with the same stub will be delivered in order, i.e.:

    let stub = OBJECT_NAMESPACE.get(id);
    let promise1 = stub.fetch(request1);
    let promise2 = stub.fetch(request2);
    await promise1;
    await promise2;

If you have heard of a concept called "E-order" (from capability-based security and the E programming language designed by Mark Miller), we try to follow that wherever possible.

> Are messages queued so activated objects can reconstruct state?

No. The only state that is durable is what you explicitly store using the storage interface that is passed to the object's constructor. We don't attempt to reconstruct live object state. We thought about it, but there's a lot of tricky problems with that... maybe someday.

If the machine hosting an object randomly dies mid-request, the client will get an exception thrown from `stub.fetch()` and will have to retry (with a new stub; the existing stub is permanently disconnected per e-order). In capability-based terms, this is CapTP-style, not Ken-style.

> Can passivation warmth be controlled?

Sorry, I don't know what that means.

> Can objects support multiple threads?

No, each object is intentionally single-threaded. It's up to the app to replicate objects if needed, though we might add built-in features to simplify this in the future.

> Can objects move?

This is a big part of the plan -- objects will transparently migrate between datacenters to be close to whatever is talking to them. It's not fully implemented yet, but the pieces are there, we just need to write some more code. This will be done before coming out of beta.

> Failover?

If a machine goes down, we automatically move the object to a different machine. If a colo goes down, we will automatically move to another colo. We still have a little bit of missing code for colo failover -- the data is replicated already, but we haven't fully implemented the live failover just yet. Again, that'll happen before we exit beta.

ec109685 · on Sept 29, 2020

Around the ordering guarantee, are there tricky edge cases if objects are moving between machines in the face of network partitions or some other system instability? Where request1 and request2 hit two different instances of the object, but request1 persists at the end instead of request2?

kentonv · on Sept 30, 2020

Those are tricky edge cases for us to solve, yes, but that's our job, not yours. :)

ec109685 · on Oct 1, 2020

:) More of a question of what tradeoffs you're using to solve them.

toddh · on Sept 28, 2020

> Can passivation warmth be controlled?

A variant of the cold start problem. How long after all the messages for an object drain is it passivated? Can you keep it pinned in memory?

Another question. Can objects contain relationships that are themselves references to other Durable Objects? Say a simple reference, or a list, or DAGs?

kentonv · on Sept 28, 2020

The system will evict the live object when it has been idle for some period. If there are connections still open to the object, it won't be evicted (unless it exceeds resource limits, etc.).

As always, "cold starts" with Workers are very fast, usually imperceptible to a human. Also, multiple objects may be hosted in the same isolate; when instantiating in an existing isolate, the only "cold start" overhead is whatever you write in your class constructor.

> Can objects contain relationships that are themselves references to other Durable Objects?

Yes, by storing the object IDs.

toddh · on Sept 28, 2020

Are they garbage collected? Do you use tombstones?

kentonv · on Sept 28, 2020

No, no GC, at least at present. If you store any durable state, you have to delete it explicitly, or it stays forever.

I am interested in the idea of objects whose IDs are never revealed to the app, but to which references can be stored inside other objects. Then we could do GC... and it would be a true capability system.

ramchip · on Sept 28, 2020

> I'm going to be honest: naming this product was hard, because it's not quite like any other cloud technology that is widely-used today.

Perhaps I'm missing something important, but isn't this quite similar to Orleans grains and other distributed actors?

kentonv · on Sept 28, 2020

"Actors" was actually one of the names we used internally for a long time (it's still all over the code), but eventually decided against because we found that people familiar with the Actor Model actually expected something a bit different, so it confused them.

But yes, the basic idea is not entirely new. For me, Durable Objects derive from my previous work on Sandstorm.io, which in turn really derives from past work in Capability-based Security (many implementations of which are Actor-oriented). But while the idea is not entirely new, the approach is not very common in web infrastructure today.

(I'm not familiar with Orleans.)

reubenbond · on Sept 28, 2020

We (Orleans team) also stopped referring to them as actors some time ago for the same reason. The Orleans papers call them Virtual Actors or Grains. Usually, I describe grains as Distributed Objects.

kentonv · on Sept 28, 2020

Hah, "Grains" is also what we called them in Sandstorm.io.

ignoramous · on Sept 29, 2020

> ...we found that people familiar with the Actor Model actually expected something a bit different

I haven't done any programming with Actors per se but after skimming over its Wikipedia entry and other blog posts, Durable Objects does sound a lot like Actors to me.

Genuinely curious: What were some glaring differences that you were made aware of that led to not naming it Actors?

Thanks.

kentonv · on Sept 29, 2020

So, this would be a better question for someone who has actually worked with other actor-model frameworks. But, one sense that I get is that in Erlang, there is a design philosophy where most actors are intentionally stateless so that if anything goes wrong they can die and be replaced easily.

So ironically it seems like many people expected statelessness to be a property, which is the opposite of what we were going for!

Disclaimer: I haven't worked with Erlang myself and I'm probably missing some nuance here. My background is in object-capability systems, which also commonly claim to be actor systems, and match what we're doing very closely.

acarrera94 · on Sept 29, 2020

Actors can be stateful and stateless, so this is a subset, and made serverless. Pretty cool! I get it, naming is hard and “serverless stateful actors” might have been too long of a name. Excited to check out this product.

cxr · on Sept 28, 2020

Some wanky theory about computing and the design of programs follows. (Not out of scope considering the philosophical underpinnings of this product and the "edge", etc.)

The chat demo says:

> With the introduction of modules, we're experimenting with allowing text/data blobs to be uploaded and exposed as synthetic modules. We uploaded `chat.html` as a module of type `application/octet-stream`, i.e. just a byte blob. So when we import it as `HTML` here, we get the HTML content as an `ArrayBuffer`[...]

    import HTML from "chat.html";

I've thought a lot about this for the work that I've been doing. From an ergonomics standpoint, it's really attractive, and the only other viable alternatives are (a) dynamically reading the asset, or (b) settling on using some wrapper pattern so the original asset can be represented in the host language, e.g.:

    export const IMAGE_DATA =
      "iVBORw0KGgoAAAANSUhEUgAAAD8AAAA/..." +
      "..."

    export const HTML = `
      <!-- totally the HTML I wanted to use -->
    `;

... which is much less attractive than the "import" way.

Ultimately I ended up going with something closer to the latter, and there wasn't even any reluctance about it on my part by the time I made the decision—I was pretty enthusiastic after having an insight verging on a minor epiphany.

I'd been conflicted around the same time also about representing "aliens" (cf Bracha) from other languages and integrating with them. I slapped my head after realizing that the entire reason for my uneasiness about the latter "data islands" approach was because I wasn't truly embracing objects and that these two problems (foreign integration and foreign representation) were very closely related. Usually you don't actually want `HTML`, for example, and focusing on it is missing the forest for the trees. I.e., forget whatever you were planning with your intention to leave it to the caller/importer to define procedures for operating on this inert data. Make it a class that can be instantiated as an object that knows things about itself (e.g. the mimetype) and that you can send messages to, because that's what your program really wants it to be, anyway. Once you're at that point, the "wrapper" approach is much more palatable, because it's really not even a wrapper anymore.

pier25 · on Sept 28, 2020

If I'm getting this right, it's essentially immediately consistent distributed state for workers. They could have called it simply "Workers State" :)

Now in all seriousness, this is super impressive. Congrats to the CF team!

kentonv · on Sept 28, 2020

Heh, that is actually a name we considered, and as a name on its own, I like it a lot.

But we also needed a name for the individual instances. We also found that the people who "got" the product were the ones who thought of it in terms of object-oriented programming (an object is an instance of a class). So we ended up gravitated towards "objects".

But I dunno, naming is hard. "Workers State" may in fact have been a better name!

eastdakota · on Sept 28, 2020

Definitely the hardest product to name I can remember.

athriren · on Sept 29, 2020

workers state is more memorable, mainly because of the double entendre.

pier25 · on Sept 28, 2020

From a pragmatic point of view the idea seems to be allowing stateful workers. Objects seems more like an implementation detail.

Anyway, thanks again for working on this. I'm sure I'm going to use those once they come out of beta!

agotterer · on Sept 29, 2020

Is it possible to query across objects? Like if you wanted to find every object instance which had “string” written to it?

Can the data store only store alphanumeric or can you write blobs? Could a chat app store uploads inside the object?

luord · on Sept 29, 2020

Now this is one of the times I wish I had at least one idea for an application, because this is the kind of thing I'd like to try out.

Oh, well, I'll wait until it's an open beta or generally available.

phn · on Sept 28, 2020

I cannot find any word on pricing, is it included in the regular workers $5/mo plan?

greg-m · on Sept 28, 2020

Hey, I'm Greg, the PM working on Durable Objects at Cloudflare. As part of the private beta, we're looking to get feedback on the best way to price Durable Object so they're accessible for all applications - small or large.

While we're in beta, storage access will be free. As we're thinking about it now, once we're out of beta this wouldn't be included in the base $5/mo plan.

Since there's both a compute component (a Durable Object runs code, like a Worker) and a storage component (for storage operations) to the product, we want the long-term pricing model to mesh those two in a transparent, competitive way.

While we're not finalized on price yet, you can expect that costs for storage will be cheaper than existing services like AWS DynamoDB or Google Cloud Firestore when we move out of beta.

tmikaeld · on Sept 28, 2020

"you can expect that costs for storage will be cheaper than existing services like AWS DynamoDB or Google Cloud Firestore"

That's a great lead for pricing.

pibefision · on Sept 28, 2020

excellent response also.

tmikaeld · on Sept 28, 2020

Do you expect data transfer (bandwidth) to still be free?

kentonv · on Sept 28, 2020

For those in the beta, it's currently free. We are still working out what pricing will look like post-beta. We realized we need to see how people actually use it and get some feedback before we could settle on the right pricing structure... that's what betas are for.

phn · on Sept 28, 2020

Thank you for answering. In retrospect my comment looks a bit dry in comparison with the monumental achievement so:

Congrats on launching! It's awesome to see Cap'n Proto and sandstorm's legacy living on :)

ocdtrekkie · on Sept 28, 2020

FWIW, as a community project, Sandstorm also continues to live on! ;) There's been some pretty substantial refactorings going on and a bunch of quality of life fixes, sometimes closing out 5+ year old feature requests.

pencilcode · on Sept 29, 2020

Would something like a tinyurl clone also be a good use case for this? On a first hand look, it does look good.

pencilcode · on Sept 29, 2020

Although if the object is single threaded too many reads might overload it. Or is the object then replicated?

kentonv · on Sept 29, 2020

I would probably recommend using Workers KV for a tinyurl clone. Consistency is not important for this use case.

eastdakota · on Sept 28, 2020

As others said, we’re figuring out pricing during beta but hope to keep it in-line with pricing for Workers KV. And it may be possible for us to get pricing even lower than that.

visarga · on Sept 28, 2020

How would you debug and upgrade your durable objects?

kentonv · on Sept 28, 2020

This is a great question, and explains why we decided that the durable storage API needed to be explicit, rather than automatically serializing the in-memory object. Nothing is stored unless you explicitly use storage.put(key, value).

Since the storage is explicit, it's easy to upgrade the class definition. The in-memory object will be reconstructed and will need to refresh its state from storage.

asdev · on Sept 28, 2020

are updates to durable objects guaranteed to be exactly once?if an update is sent but the connection between client and object are dropped, how is that handled?

greg-m · on Sept 28, 2020

Yes, updates are guaranteed to happen exactly once or not at all.

If the connection drops, the Worker will receive an error and can re-establish its connection to the Durable Object. The update may or may not have been successfully persisted by the Durable Object - just like any other remote database operation where the connection drops before you receive the result back.

vyrotek · on Sept 28, 2020

Is this similar to Azure Durable Functions?

https://docs.microsoft.com/en-us/azure/azure-functions/durab...

From what I understand these features are a nice way to implement a serverless Actor Model. I was surprised to see no reference to it on the CloudFlare page.

ShockaZ · on Sept 28, 2020

I was thinking this is a lot more like Microsoft Orleans:

https://dotnet.github.io/orleans/Documentation/index.html

evntdrvn · on Sept 28, 2020

very close to Azure Functions Durable Entities: https://docs.microsoft.com/en-us/azure/azure-functions/durab...

kentonv · on Sept 28, 2020

Possibly. We did not base the design (or name) of Durable Objects on any other product we were aware of (except arguably Sandstorm.io, which was my startup before joining Cloudflare). I haven't looked closely at Azure Durable Functions.

We actually did call this product "Actors" internally for a long time, but we found that people who had done previous Actor Model work (e.g. in Erlang) ended up more confused than enlightened by this name, so we ditched it.

cgillum · on Sept 28, 2020

Interestingly, Azure's Durable Entities has a similar feature set and similar story of origin (internally called it actors, then switched to "entities" to avoid confusion): https://medium.com/@cgillum/azure-functions-durable-entities...

dividuum · on Sept 28, 2020

Is there only a single instance of the example Counter object globally and as there are no additional await'ed calls between the get and put operations, the atomicity is guaranteed? Is the object then prevented from getting instantiated on any other worker?

Can this result in a deadlock if I access DurableClass(1), then delayed DurableClass(2) in one worker and DurableClass(2) and delayed DurableClass(1) in another worker?

kentonv · on Sept 28, 2020

Each object is essentially an Actor in the Actor Model sense. It can send messages (fetches, and responses to fetches) to other objects and regular workers. Incoming requests are not blocked while waiting for previous events to complete.

Hence, a block is only atomic if it contains no "await" statements.

In the counter example, the only thing we "await" (after initialization) is the storage put()s. Technically, then, you could imagine that the put()s could be carried out in the wrong order. But, we're able to guarantee that even though put() is an async method, the actual writes will happen in the order in which put()s were called.

(For those with a background in capability-based security: Our system is based on Cap'n Proto RPC which implements something called E-order, which makes a lot of this possible.)

* Disclaimer: At this very moment, there are some known bugs where put()s could theoretically happen out-of-order, but we'll be fixing that during the beta.

imtringued · on Sept 29, 2020

The actor model doesn't prevent "semantic" deadlocks that are caused by circular dependencies. It's kinda like reference counting which also doesn't handle cycles. In practice it doesn't matter and when it matters you have already saved enough brain cells that you can think about the tricky parts in isolation.

However, memory corruption via manual memory management and deadlocks via manual locking are commonly caused by simple and innocent programming mistakes and basically something one has to live with on a day to day basis.

ilaksh · on Sept 29, 2020

All hail the Cloudflare Gods! They are benevolent gods, say I!

And today they have given into us a new, powerful bounty of storage with a delicious API!

kovek · on Sept 29, 2020

This is awesome, and I'm so excited to read through the chat.mjs code. I might consider trying this out for a project. It means I need to use cloudflare? I wonder if in the future, this could become more standard, and one could do something similar on their own infrastructure (maybe such a solution already exists, open sourced somewhere?)

skybrian · on Sept 28, 2020

I’m wondering what sort of durability guarantees there are in case of an outage? It seems like replicating durable storage would add latency?

Is there going to be Jepsen testing for this?

a-robinson · on Sept 28, 2020

Storage is replicated across a handful of nearby sites. It does add some latency to writes, but that's preferable to Objects being offline or lost in the case of hardware or network failures.

There's no Jepsen testing in the works at the moment, but we'll see if it makes sense in the future.

gavinray · on Sept 28, 2020

Really interesting, seems like it has some unique abilities.

Signed up for beta invite -- does anyone happen to know whether all interested parties are admitted?

greg-m · on Sept 28, 2020

Thanks for the interest!

We're keeping access limited at first so we can get experience operating the system. We'll be expanding continually over the next few weeks.

gavinray · on Sept 28, 2020

That's super reasonable, appreciate the reply and hopefully will get a chance to experiment with it soon =D

akritrime · on Sept 28, 2020

One thing I am not sure, maybe I missed something in the blog but will there be always one instance of this object or can there be multiple?

greg-m · on Sept 28, 2020

Durable Objects are globally unique, so it's guaranteed that there's just one instance with a given id.

akritrime · on Sept 28, 2020

Right, that's what I got from the Unique explanation in the doc. What I am confused about is this part:

With Durable Objects, you instead design your storage model to match your application's logical data model. For example, a document editor would have an object for each document, while a chat app would have an object for each chat. There is no problem creating millions or billions of objects, as each object has minimal overhead.

What does it mean that a document editor will have an object for each document? Will I have to create a new UDO each time a new document is created?

a-robinson · on Sept 28, 2020

It means that you'd use a different ID to access each document. Each document's Durable Object would run the same code as part of the same namespace of Durable Objects, but have their own in-memory and durable state. Check out the docs for a bit more context: https://developers.cloudflare.com/workers/learning/using-dur...

akritrime · on Sept 28, 2020

I am still a bit confused. Will this ID be different from the object id of for accessing the Durable Object, essentially in this case we would be using the Durable Object as a key-value storage? Or is it like the Namespace is separate from the Durable Object and each Namespace can have multiple objects of the same class under?

Edit: I think I get it now. Sorry I misunderstood that each Durable Object is like a singleton for the class you define. Its the other way around, you have the definition and namespace and then you can create a new object from those whenever you need it and this one would unique and accessible across all workers.

proppy · on Sept 29, 2020

what's the maximum qps that a single durable object can handle?

kentonv · on Sept 29, 2020

An object is limited to one thread. How many qps that is depends entirely on what your app does, since the app can run arbitrary code in the request handler...

proppy · on Sept 29, 2020

sorry for wording the original question poorly, let me rephrase: what are the CPU and RAM usage limits that back that single object thread, will that be something developer have control on?

asim · on Sept 28, 2020

Is this not sharding?

imtringued · on Sept 29, 2020

In every datacenter near you and your users, without requiring you to own or rent a single server.