The advantage of a monorepo in this particular case is that it makes easy things easy: if you want to remove a parameter of a function in some library and that function has just a few callers in dependent executables, you can just do that in a single commit. Without a monorepo, you have to do the full-blown iterative rollout described in the OP even for small changes, if they cross VCS boundaries.
It's not about migrating APIs or coordinating deployments. That's an impossible problem to solve with your repo. It's to update libraries and shared code uniformly and patch dependencies (eg. for vulns) all in one go.
Imagine updating Guava 1.0 -> 2.0. Either you require each team to do this independently over the course of several months with no coordination, or in a monorepo, one person can update every single project and service with relative ease.
Let's say there's an npm vuln in leftpad 5.0. You can update everything to leftpad 5.0.1 at once and know that everything has been updated. Then you just tell teams to deploy. (Caveat: this doesn't really work as cleanly for a dynamically typed language like javascript, but it's a world wonder in a language like Java.)
I can't fathom how hard it would be to coordinate all of these changes with polyrepos. You'd have to burden every team with a required change and force them to accommodate. Someone not familiar with the problem has to take time out of their day or week to learn the new context. Then search and apply changes. And there's no auditability or guarantee everyone did it. Some isolated or unknown repos somewhere won't ever see the upgrade. But in a monorepo, you're done in a day.
Now, here's a key win: you're really at an advantage when updating "big things". Like getting all apps on gRPC. Or changing the metrics system wholesale. These would be year long projects in a polyrepo world. With monorepos, they're almost an afterthought.
Monorepos are magical at scale. Until you experience one, it's really hard to see how easy this makes life for big scale problems.
Do you really feel that confident working in another team’s code base? I work in a multi-repo company, and almost every time I’ve gotten a patch from outside my team it’s been wrong in some way. Why would I want to make it easier for people who don’t understand (and aren’t interested in understanding) my project to land code in it?
Another advantage of a mono-repo is that it encourages everyone to use the same tooling & coding libraries. So (at least at Google) I can open another team's codebase (as long as it's in the main mono-repo) and understand it within ~10 minutes.
I fix bugs that bother me in random projects (both internal and external) maybe once a month (most recently, in the large scale change tool!). For context, I've been at Google for ~3 years. I've only had a changelist rejected once, and that was because the maintainer disagreed with the technical direction of the change and not the change itself.
Because any company reaching monorepo scale will have integration tests that cut across the boundaries of your projects. It's possible for an outside contribution to break your corner of the repo, but the flip side is that you will know much more quickly if your own changes break another part of the repo.
>Because any company reaching monorepo scale will have integration tests that cut across the boundaries of your projects.
Heh. This makes a couple assumptions that I only can wish were true: (a) that people won't go to monorepo until they hit some huge scale, and (b) that people will at that point have good test coverage.
I completely disagree. My company is absolutely “monorepo scale”, but I also know we’re nowhere close to having the test coverage to allow people unfamiliar with a project to freely land changes in it.
I think you’re misunderstanding something because this isn’t (usually) a way to bypass code review by the team that owns the code.
You want to make it easier to contribute so that people can send a patch and it’s more likely to be useful without too much back-and-forth in code review. Having common tools and coding standards makes that more likely.
None of the things you state are related to the technical act of having a single repository, but they are all results of the organizational structure. It's entirely possible to have a monorepo where one person doesn't have the organizational or technical ability to update everything in it, and you can also have split repositories where a single person does.
As I read your post, you're attributing a lot of properties to a monorepo.
That's fine, but I think you should be careful whether you're pointing to the properties of using a single repsoitory in general; or the properties of tooling certain monorepo-using companies have built with no requirements other than supporting their own source control; or how uniform it can feel to jump into multiple projects when every project has been forced to a lot of the same base tooling beyond just source control; and/or a work culture that happened to have grown up around a certain monorepo - but for which a monorepo is neither necessary nor sufficient to reproduce.
I've worked jobs where the entire company is in a unified repository, and companies where a repository represents everything related to a product family, and places where each product was multiple gitlab groups with tons of projects.
The most I can say is that monorepos solve package management by avoiding package management. The rest comes down to tooling, workflow and culture.
I would be interested in hearing why it would be hypothetically worse if google had gone the other direction. Where they still spent the same amount of money and time from highly talented people on the problem of unifying their tooling and improving workflow, but done it to support a polyrepo environment instead. How would it have been fundamentally worse than what they got when they happened to do the same with a monorepo?
Hum... Ok, there has been a "mostly nonbreaking" change on leftpad that corrects some vulnerability. Are you proposing that a single developer/team clones the work of 100s or 1000s of different people, update it into to use the new leftpad, run the tests and push?
The only way this could ever work is if the change is really nonbreaking (do those exist in Javascript?), in what case you could script the update on as many repositories you want too. Otherwise, living with the library vulnerability is probably safer than blindly updating it on code you know nothing about.
Anyway, burdening all the teams with a required change is the way to go. It doesn't matter how you organize your code. Anything else is a recipe for disaster.
> with the library vulnerability is probably safer than blindly updating it on code you know nothing about.
This is what tests are for.
> Are you proposing that a single developer/team clones the work of 100s or 1000s of different people, update it into to use the new leftpad, run the tests and push? ... Anyway, burdening all the teams with a required change is the way to go.
No, and speaking from personal experience, it's much more difficult to ask ~500 individuals to understand how and why they need to make a change than to have a few people just make the change and send out CLs. Writing a change, especially one that you have to read a document to understand, has a fixed amount of overhead.
(Also, you don't have to clone all the repositories if you're in a monorepo :) ).
Forcing each team to do (or approve!) the update has nothing to do with a shared repository, it's just what limits and requirements you've added on top of your repo(s). A for loop over N repos and an automated commit in each one is perfectly achievable.
If you want consistency so you can automate stuff, require consistency.
I update our polyrepo code all the time. I just have to go into each repo and make the change. It isn't much more work than you have, the only difference I need to run more "git pull/git commit/git push" steps, and my CI dashboard draws from more builds.
I sometimes leave some repos at older versions of tools. Sometimes the upgrade is compelling for some parts of our code and of no value to others.
In multi-repo, on all projects I've worked on, there is no such problem.
First, inter-repo dependencies are managed by pulling specific a commit or tag. So changing a library has zero effect on the program depending on that library.
Second, if you are introducing breaking change and you know not all client will want the change, you can have multiple branches. No, you do not want this as the default choice and not on the long term, but for the short-term transition, that is possible.
From that point on, clients of the library can upgrade to new versions with the changes on their own schedule. The client is never forced to upgrade until there a feature is absolutely needs. The library is not forced to support two versions of code.
That last point is not trivial. If every braking change need to go through this dual-support period in the same single code base it can become a support and testing nightmare. You need to duplicate tests and as the number of such dual-version of API increase, the compatibility matrix grows exponentially.
This is entirely avoided in the multi-repos scenario.
The flip side of this is that if you need a change to go out everywhere, say because it fixes a security vulnerability, or fixes a bug that affects all users, or you want to remove the old behavior from a service in a reasonable amount of time, then you have to update the commit/version/whatever in every dependent project as well, recursively. And I've seen the pain of doing that cause bad incentives, like not splitting up projects that really need to be, or delaying critical fixes because you don't want to update depencies twice.
If it is possible to have multiple revisions of libraries you will have multiple revisions of libraries. I've seen this play out "at scale" and what happens in practice is software rots until some kind of crisis or feature demands an upgrade. At which point somebody creates another set of branches to solve precisely their problem and nothing else.
Maybe it's just where I've worked, but atomic commit carries with it some strong cultural norms that make it really tractable to have one version of everything. If code compiles and automated tests pass, the change is safe and may be committed unilaterally. Inevitably things go wrong, but the post mortems for incidents don't lead back to the lack of permission from the affected projects.
I wouldn't maintain software that can kill people in this way, but for everything else it strikes a nice balance.
If the change is in a shared library (and not a service shared over the network), it's fine to change all usages at once. Deploying a service wouldn't affect the others.
If the change affects the public interface of a service, then there's no option but to make your changes backward-compatible.
I hope nobody's life depends on the uptime of a web-based distributed system.
But, well, I also expect nobody's life to depend on it. There would be a short window between people getting into that situation and they not have any life to depend on anything.
Well the arm CPUs we use are in general purpose computers as well. Though you are correct, we don't follow the same practices as general purpose computers.
I don't think synchronized deployments is really possible - you'd have to either still do the iterative thing, or possibly have some versioning system in place
It is possible for trivial cases. What I do in my basement for example - though even there I have come to prefer keeping things intentionally unsynchronized: it ensures that after updates I have some system that still works.
It takes the guesswork out of library migrations. API migrations still need forwards/backwards compat hygiene, unless you blue/green your entire infrastructure to compatible versions, which is possible but not necessarily practical
If you design with a "no deprecations" mentality and deploy backend before frontend, in _most cases_ this isn't an issue -- the frontend code that needs the new table or column or endpoint that doesn't exist yet won't run until those things are deployed, and the new backend endpoints will be fully backwards compatible with the old frontend, so no issues.
You don't even need to be that dogmatic to make this work either -- simply stipulating backwards compatibility between the two previous deploys should be sufficient.
The better version of this is simply versioning your backend and frontend but I've never been that fancy.
Because they do nothing to make it hard to add a coupling or break modularity.
You should of course use good discipline to ensure that doesn't happen. Compared to mutli-repo it is a lot easier to violate coupling and modularity and not be detected. Anyone who is using a monorepo needs to be aware of this downside and deal with it. There are other downsides of multi-repo, and those dealing with them need to be aware of those and mitigate them. There is no perfect answer, just compromises.
They make it easy, and then human nature and dev laziness does the rest. If you can reach across the repo and import any random piece of code, you end up with devs doing just that. It's a huge huge pain to try to untangle later.
That's why tools like Bazel are strict about visibility and put more friction and explicitness on those sorts of things. But this tends to not be the first thing at the top of people's minds when starting a new project... so in the monorepos I've worked on, it's never been noticed until it's too late to easily fix.