More

alexrbarlow · on Nov 6, 2019

Hi, I'm the Author of this. Feel free to ask any questions!

dreur · on Nov 6, 2019

Thanks a lot for this post.

Could you expand on the techniques you use to implement idempotency in your workers/queues and in your rpcs?

I have seen a mix of doing nothing if there is nothing to do, locking, using a idempotency key and so on. But I am always curious to see what others do.

alexrbarlow · on Nov 7, 2019

Yeah, basically all of the above. We rely heavily on database row locking and then checking if it's already done. Sometimes you just don't need to check to (for example, just updating a timestamp)

alexrbarlow · on July 18, 2019

Does this take into account timezones?

cptskippy · on July 18, 2019

4th paragraph.

alexrbarlow · on July 10, 2018

I'm not sure what they've been left with is a monolith after all. I would say they just have a new service, which is the size of what they should have originally attempted before splitting.

In particular, as to their original problem, the shared library seems to be the main source of pain and that isn't technically solved by a monolith, along with not following the basic rule of services "put together first, split later".

I feel prematurely splitting services like that is bound to have issues unless they have 100 developers for 100 services.

The claim of "1 superstar" is misleading too, this service doesn't include the logic for their API, Admin, Billing, User storage etc etc, it's still a service, one of a few that make up Segment in totality.

athenot · on July 10, 2018

Agreed.

Reading about their setup and comparing with some truly large scale services I work with, I'm left with the idea that Segment's service is roughly the size of one microservice on our end.

Perhaps the takeaway is don't go overboard with fragmenting services when they conceptually fulfill the same business role. And regardless of the architecture of the system, there are hard state problems to deal with in association with service availability.

testplzignore · on July 10, 2018

The most telling fact is that it "took milliseconds to complete running the tests for all 140+ of our destinations". I've never worked on a single service whose tests ran that fast, given that the time spent by the overhead of the test framework and any other one-time initialization can take a few seconds just itself. It's great to have tests that run fast, but that's a bit ridiculous.

Some rules of thumb I just came up with:

Number of repos should not exceed number of developers.

Number of tests divided by number of developers should be at least 100.

Number of lines of code divided by number of repos should be at least 5000.

Your tests should not run faster than the time it takes to read this sentence.

A single person should not be able to memorize the entire contents of a single repo, unless that person is Rain Man.

mpweiher · on July 11, 2018

> never worked on a single service whose tests ran that fast

I'd say you've never had good tests.

I have a test-suite for a bunch of my frameworks that dates to the mid 90s, with tests added regularly with new functionality.

It currently takes 4 seconds total for 6 separate frameworks and 1000 individual tests. Which is actually a bit slower than it should be, it used to take around 1-2 seconds, so might have to dig a little to see what's up.

With tests this fast, they become a fixed part of the build-process, so every build runs the tests, and a test failure is essentially treated the same as a compiler error: the project fails to build.

The difference goes beyond quantitative to qualitative, and hard to communicate. Testing becomes much less of a distinct activity but simple an inextricable part of writing code.

So I would posit:

Your tests should not run slower than the time it takes to read this sentence.

mekazu · on July 11, 2018

Unit tests that don’t read or write to disk and don’t try thousands of repetitions of things should be bleeding fast, but the most useful integration tests that actually help find faults (usually with your assumptions about the associated APIs) often need interaction with your disk or database or external service and tend to take a bit more than a few seconds. I find you need both.

inetknght · on July 11, 2018

I have tests which verify DNA analysis. The test data vectors are large -- a few hundred MB here, a couple GB there. The hundreds of tests that use these test vectors still run in a few seconds.

If you're using a tape drive or SD cards, sure. But even a 10 year old 5400RPM on an IDE connection should be able to satisfy your tests' requirements in a few seconds or less.

I suspect your tests are just as monolithic as you think microservices shouldn't be. Break them down into smaller pieces. If it's hard to do that, then redesign your software to be more easily testable. Learn when and how to provide static data with abstractions that don't let your software know that the data is static. Or, if you're too busy, then hire a dedicated test engineer. No, not the manual testing kind of engineer. The kind of engineer who actually writes tests all day, has written thousands (or hundreds of thousands) of individual tests during their career. And listen to them about any sort of design decisions.

dvogel · on July 11, 2018

Sounds like you have tests that need to read (probably cached) data files while the parent poster has tests that need to write to disks (probably in a database transaction). Those are different enough that run times won't ever be comparable.

inetknght · on July 11, 2018

I have tests that need to read. I have tests that need to write. All data written must also be read and verified. You're right, the data is probably cached.

If you need to access a database in your tests you're probably doing it wrong. Build a mock-up of your database accessor API to provide static data, or build a local database dedicated for testing.

mpweiher · on July 11, 2018

Sure. I'd venture to say that integration tests should be fewer than unit tests, see hexagonal etc. Hopefully those external interfaces are also more stable, so they don't need to be run as often.

I tend to use my integration tests also as characterization tests that verify the simulator/test-double I use for any external systems within my unit tests.

See also: the testing pyramid[1] and "integrated tests are a scam"[2], which is a tad click-bait, but actually quite good.

[1] https://martinfowler.com/bliki/TestPyramid.html

[2] https://www.youtube.com/watch?v=VDfX44fZoMc

sieabahlpark · on July 11, 2018

Not all domains have libraries which can run tests that fast?

mpweiher · on July 11, 2018

Sure. I'd suggest that the domains that limit testing speed are exceedingly rare. Much, much, much more common are technical issues.

Either way, the idea of considering slow tests a feature was novel to me.

inetknght · on July 11, 2018

> I've never worked on a single service whose tests ran that fast, given that the time spent by the overhead of the test framework and any other one-time initialization can take a few seconds just itself. It's great to have tests that run fast, but that's a bit ridiculous.

It's not ridiculous. It's good.

I work on an analysis pipeline with thousands of individual tests across a half dozen software programs. Running all of the tests takes just a few seconds. They run in under a second if I run tests in parallel.

If your tests don't run that fast then I suggest you start making them that fast.

I'd be willing to bet that if you learned (or hired someone with the knowledge of) how to optimize your code, you could get some astounding performance increases in your product.

techinformed56 · on July 10, 2018

+1

I felt this article is more about how to use microservices right way vs butchering the idea. It is not right to characterize this as microservices vs monolith service. Initial version of their attempt went too far by spinning up a service for each destination. This is taking microservices to extreme which caused organizational and maintenance issue once number of destinations increased. I am surprised they did not foresee this.

The final solution is also microservice architecture with a better separation of concerns/functionalities. One service for managing in bound queue of events and other service for interacting with all destinations.

tabtab · on July 16, 2018

Age-old truth: "Use the right tool for the right job".

hinkley · on July 10, 2018

    unless they have 100 developers for 100 services.

That cure is worse than the disease. Every service works differently and 80% of them are just wrong, and there’s nothing you can do because Tim owns that bit.

noir_lord · on July 10, 2018

Not to mention you have 2^100 states of on/off (~1.2676506e+30)

hinkley · on July 10, 2018

I work on that project. Every time some idiot starts talking about 'code coverage' my face turns red. Our code coverage is 1e-10%. Don't talk to me about this 70% bullshit.

inetknght · on July 11, 2018

Said with less vitriol:

It's not just code coverage that matters. It's the code path selection that matters. If you have a ton of branches and you've evaluated all of them once then yeah you sure might have 100% "coverage". But you have 0% path selection coverage since a single invokation of your API might choose true branch on one statement, false branch on another statement, and a second invokation might choose false branch on the first and true on the second.

While the code was 100% tested, the scenarios were not. What happens if you have true/true or false/false? That's not tested.

There's a term for this but I forgot what it is and don't care to go spelunking to find it.

mosheroperandi · on July 13, 2018

> There's a term for this but I forgot what it is and don't care to go spelunking to find it.

sqlite calls this "branch coverage"

https://www.sqlite.org/testing.html#statement_versus_branch_...

piaste · on July 11, 2018

> There's a term for this but I forgot what it is and don't care to go spelunking to find it.

Happy path?

inetknght · on July 11, 2018

Not quite what I meant, but that's another good description

joeax · on July 10, 2018

At this point you're over-granularizing your services into nanoservices, which is an anti-pattern.

kolpa · on July 10, 2018

How does the service architecture affect that? Tim could be as protective of a code file as he is of a service. At least with a service you could work around it.

cloverich · on July 10, 2018

One way is that with different services, its more likely to have both a different language, framework, and paradigms -- that perhaps only Tim is familiar with (that's been my experience). Its definitely got a different repo, perhaps with different permissions.

topicseed · on July 10, 2018

But if you can explain to the team or the CTO why Tim is doing it wrong and how it is impacting X, Y and Z, then Tim will fix or be sent else where, no?

hinkley · on July 10, 2018

Tim accuses everyone else of being lazy or stupid.

Tom (real guy) was too busy all the time to do anything other than the 80/20 rule. He was too busy because he didn't share. So of course he was a fixture of the company...

kofejnik · on July 10, 2018

but the CTO is a nice guy who just had a management training, so instead, both you and Tim are sent to a mandatory conflict resolution class

VintageCool · on July 11, 2018

Each developer works in their own little silo and doesn't bother to learn the code outside their silo. Each team member developers their own idiosyncratic style. If they have to work with someone else's code, it's unfamiliar and they make slow progress and get cranky.

Now all the developers are going to the CTO or CEO and undermining the other developers, trying to persuade the CTO that so-and-so's code is shit.

emanlin · on July 10, 2018

No, not unless you’re somehow able to make the team, the CTO and Tim feel good at the same time. If you figure that part out let me know.

topicseed · on July 10, 2018

But if Tim's work is so bad that you can prove it..... How can a boss dodge the proof? I guess Tim should be fired second, the boss should go first...

flukus · on July 11, 2018

> How can a boss dodge the proof?

This is a business decision, reality has no influence here.

That's why they invented the term "Business reality".

kohlerm · on July 11, 2018

It looks to me that the shared library issue got solved by the monorepo approach. They could have gone the monorepo way and still have microservices. Managing a lot of repos and keeping them consistent with regards do dependencies is not easy. In reality you do not want everyone to use a different version of a dependency. You might allow deviations but ultimately you wand to minimize them.

They also just might have had too many repos.

Cthulhu_ · on July 11, 2018

Next to a monorepo they would also need a deployment strategy allowing them to deploy multiple services (e.g. every service that was affected by the library change) simultaneously, so that after deploying they can still talk to one another. For a single service this is doable enough (start up, wait for green health, route requests to new service instance), but it increases in complexity when there's >1 service. I'm sure the process can be repeated and automated etc, but it will be more complex. Doing zero-downtime deployments for a single service is hard enough.

matwood · on July 10, 2018

> along with not following the basic rule of services "put together first, split later".

Agreed. I treat services like an amoeba. Let your monolith grow until you see the obvious split points. The first one I typically see is authentication, but YMMV.

Notice I also do not say 'microservices'. I don't care about micro as much as functional grouping.

inetknght · on July 11, 2018

> the basic rule of services "put together first, split later"

Is this rule mentioned or discussed somewhere? A quick google search links to a bunch of dating suggestions about splitting the bill. Searching for the basic rule of services "put together first, split later" reveals nothing useful.

tomnipotent · on July 10, 2018

They never said the entire company was left with 1, only than 100s were condensed to "1 superstar".

nikon · on July 10, 2018

+1

Changing one "shared library" shouldn't mean deploying 140 services immediately.

They had one service to begin with forked for each destination. Of course that was a nightmare to maintain!

Cthulhu_ · on July 11, 2018

I can see their reasoning though; most of those services are pretty straightforward I think (common data model in -> transform -> specific outbound API data out -> convert result back to common data model). The challenge they had is that a lot of the logic in each of those services could be reused (http, data transformation, probably logging / monitoring / etc), so shared libraries and such.

alexrbarlow · on Dec 2, 2017

Echo | Senior/Mid Golang/Node/K8S Engineer | London, UK

We're revolutionising healthcare with medication delivery and management and we’re looking for Golang/Node/K8S devs to work on services based architecture and cloud infra here at Echo.co.uk! We’re based in London and love Kubernetes, Prometheus, Go, GraphQL, Istio and good Coffee. We have just raised a series A round of 7m funding and will be integrating with the NHS soon along with a larger roadmap.

Email alex.barlow[at]echo.co.uk

alexrbarlow · on Nov 1, 2017

ABOUT US: We’re looking for Golang devs to work on (micro)services and cloud infra here at Echo.co.uk!

We’re based in London and love Kubernetes, Prometheus, Go, GraphQL and GCP and good Coffee. We have just raised a series A round of funding and will be integrating with the NHS soon.

Kubernetes, Prometheus, Go, GraphQL, GCP, Node, Docker are all good knowledge areas to have. But generalists are welcome and new Go coders are welcome too!

Please contact alex.barlow@echo.co.uk https://www.echo.co.uk/careers

alexrbarlow · on Nov 21, 2016

Interesting. I've collected quite a few from friends + colleagues already and only one has such a header.

Also, that doesn't help with my having fun and learning more machine learning!

alexrbarlow · on Nov 8, 2016

Looks really nice, often having business wide dashboard are a pain and it's nice for everyone to be able to see sign ups etc. MySQL data source is good but I'm hoping they'll include things like prometheus or others later. For now this would work with some Segment.io or similar batch jobs

alexrbarlow · on June 18, 2015

Maybe it was! https://github.com/kevva/is-positive/commit/3db0639fb52c463f...

alexrbarlow · on April 23, 2015

I totally agree and yeah, I used Martini to make it a little easier to read

navd · on April 23, 2015

Understood, the community really needs more idiomatic examples for building web apps. Someone should take this up.

alexrbarlow · on April 1, 2015

The door functions as normal becuase this is just an add on! The keypad etc outside works fine.