It's unsurprising when you consider that there are often several magnitudes of difference in code between what code grows to when you have the capacity and time and compounding user requests, and what a meaningful starting point that provides useful functionality above and beyond what you had without it looks like.
As an extreme example here[1] is an article by Brian Kernighan about a basic regexp matcher by Rob Pike. The code, with comments, is 35 lines of C.
Meanwhile, the regexp engine in Ruby 3.2.2, not even including the Regexp class visible to the language, is ~20597 lines excluding the headers.
They are not reasonably comparable, of course. Pike's code provides a very basic syntax which doesn't even support character classes, and does nothing to speed up the matching. The latter supports a very much more complex syntax, and is far faster for repeated or long matches.
But while Ruby's regexp engine is 588 times large than Pike's code, you of course get a vastly higher boost in overall system capability from those first 35 lines of code (from no regexp matching to some regexp matching) than the last 35 lines of code, or indeed the second 35 lines.
So if you have a small system and a small team, you work with that and start small and it won't be that surprising when you get a lot done in few lines of code, even though you have a long list of things you'll add when you can justify the extra resources (like those missing language classes, and a billion other features)
(Then you get a larger system, and you'll very soon find yourself wondering how you could manage with so little)
I think it's mostly surprising because most developers today aren't used to thinking about capacity constraints of small systems, and so starts designing for lots of features from the start (can't have regexps without character classes, and capture groups, and back-references, and ...).
"I think it's mostly surprising because most developers today aren't used to thinking about capacity constraints of small systems, and so starts designing for lots of features from the start (can't have regexps without character classes, and capture groups, and back-references, and ...)."
Sometimes when I have expressed admiration for sed(1) in the past on HN, someone replies something like, "Yeah, but it doesn't have capture groups."
It is amazing how much work I do with sed(1). Basic RE most of the time, not even Extended RE. And I'm only using a fraction of what sed can do. It is not just me. This small program is everywhere, on every computer. It's in the toolchains used to build the operating systems and other software that everyone is using. The computing world depends on sed.
Then there are people online who complain about RE. With memes, no less. It's baffling to me because I find RE so useful. Eventually I realised the reason they dislike RE is because they want to compose something really complex, they get in over their head and then they try to blame RE instead of ther own stupidity. Meanwhile they could be doing a multitude of simpler things with RE very effectively. But that's not what they want.
Nope. They want the 2000+ lines of code, not the 35.
Of course this is a generalisation. Hence the word "most". There are some who are interested in the 35.
It's really easy to evaluate software today, assuming one is searching for simplicity, because so much of it is garbage written by people who are hopelessly addicted to needless complexity. They cannot define "simple". The mere use of the word is triggering for them.
Yeah, I think the 35 is probably too simplistic, but as the links elsewhere in this thread shows, you can get an implementation that converts to a DFA and is competitive with your browsers and Nodes native regexp engines in just a couple of times that. I'm not inherently against having options that do crazy work to provide extra features and squeeze out a bit more performance, but I also wish more people would try starting over, because the results are often surprising.
E.g I'm pretty much "by accident" building my own desktop environment. I don't really want to, but I've rewritten bit by bit of software where it turns out rewriting something that does exactly what I want is often simpler than trying to fix issues in huge pieces of software.
It took me one night to replace Caja with my own desktop/ file manager. It does far less than Caja, but it does more of what I want. E.g. it has a semi-spatial mode that lets me control when to snapshot positions rather than do it either always or never.
300 lines of code. Should other people run it? Probably not, but at 300 lines we can afford a lot of variants more closely tailored to different workflows.
My terminal is ca 1000 and close to be more accurate and have more features than st (about 8k lines), and has more of the feature I actually use than xterm (88k lines of code). Xterms scrollbars code is many times the size of my entire terminal.
Xterm is more capable, but not 88 times more capable... And less capable in the areas I care about.
I tend to think we need more software that is less capable.
I'd rather have a wider choice in 1kloc terminals than 1 x 88kloc one, because the 1kloc ones are far easier to customize and tailor.
It does make one wonder why adding diminishing returns is the usual way of software development.
Proof of concept -> early release -> stable release -> add feature -> add feature -> add feature, ..., repeat.
CPU cycles, RAM use, storage space, bugs, updates, managing complexity, manuals, adapting to newer standards, programmers understanding codebases, etc, etc, etc. It's not like incremental additions are 0-cost.
Kind of like inventory: of all the stuff (most) people have in their homes, only a small fraction is used regularly. The rest only sees use very rarely, or exists for "nice to have", decoration, or plain luggage / junk. Turning free living space into a junk bin, making it more difficult to move house, etc. People who think that junk in their attic costs nothing, can't do math.
The software fix would be to hunt aggressively for ways to simplify, reduce binary size & in-use memory footprint, remove lesser-used features, weigh any feature (present or potentially added) vs. its impact on maintainability, code size, etc.
In other words: maximize the bang-per-byte. Note this does NOT need to mean "feauture starved". Just very capable / useful given its footprint.
As opposed to maximize the feature list, or throw complex algorithms to squeeze every last % of performance.
Any projects out there that have bang-per-byte as #1 priority?
I've come to think we really should be more comfortable forking projects and keeping them small vs. adding features. E.g. consider the space of tiny to small-ish X11 menu tools. There's ratmenu, 9menu, dmenu, rofi, roughly ranged from tiny to slightly on the larger side, and I'm sure many more. If I want to add features to 9menu or ratmenu, I wouldn't add features directly to them unless it's something really minor or very generic. I'd fork them, or rewrite from scratch. Their appeal is that they're small and focused.
I'd rather have a choice of a dozen like them with slightly different feature sets that are easy to customize and tweak for my use, than one big one. Where the cutoff point is varies - I tend to find rofi a bit too big and complex, for example.
Toy solutions deal with small data inputs. How many lines of that 20k is just optimizing for large inputs (you can’t have long repeating sections in small inputs).
That's beautiful. And I think a great illustration of how much more impactful the "next n lines" or so are going to be than the "last n lines"
EDIT: Also, the impact is even greater when looking at your performance improvements with the dfa conversion as well:
https://jasonhpriestley.com/regex-dfa
The features are less important than the linear complexity.
It lowers to an NFA. An NFA can recognise any linear language, so adding more features affects the generation of the NFA, while increasing the performance involves faster matching against the NFA.
In this case he's done a followup with benchmarks where he's converting the NFA to a DFA ajd comparing favourably against both Node and your browers regexp engine with only a tiny little bit extra code:
That doesn't mean there isn't value to the rest of those 20k lines of code I referenced - that was not the point.
A lot of them adds a bunch of convenience, like a more expressive syntax that saves you from writing more convoluted regexps, and it presumably bought more performance than their previous, smaller iteration, every step up from a much smaller engine way back. I'm sure one could do better with less, but I'm also not dismissing that I'm sure each step was reasonable given the constraints.
But that too is also not the point. It was not about dismissing the size of the Ruby regexp engine as not worth it.
The point was that there is a very significant and rapid diminishing return, and that this explains why you can do so seemingly much with so very little, because the leap from no capability to something usable takes very little, but each subsequent increment will tend to buy you less, for more work.
That doesn't mean people should stop putting in that extra work and squeeze out a bit more. It just gives an answer to the question in the link.
Being O(n) is important, but the size of the constant factor is important, too. E.g. the theoretically most efficient known matrix multiplication algorithm only outperforms the less efficient common algorithms at ludicrous matrix sizes, because of the huge constant factor.
Yes, but this is again totally besides the point, and addressed by the second link to the version that does dfa conversion in a few dozen more lines and is competitive with the regexp implementation in chrome.
A lot of it is, but the point remains no matter where you set t he bar, you can usually get most of the benefit with a tiny portion of the code. Sometimes squeezing out a tiny bit more performance isn't worth much, sometimes it's worth 10x or 100x the amount of code - the point is not that it's inherently wrong to write all that extra code.
But you should at least be aware when the returns are diminishing to a point where the payoff becomes uncertain.
Dealing with edge cases is where your code blows up. Once you find an edge case do you leave it unaddressed and the program generate a wrong/unexpected answer? Could it be a potential security flaw?
Unfortunately when we write initial specifications we almost certainly get them wrong in one way or another. Trying to deal with that after the fact commonly leads to application bloat.
My experience is that while edge cases certainly adds bloat, there are plenty of cases where the bloat comes from entirely different things, and where simply rewriting with the hindsight of being able to see the overall structure better can do wonders. In other words: A lot of the time undoing mistakes made first time around as a result of not having the full picture.
And often it's a result of trying to cater for everything even when its not needed. E.g. layers of indirection in anticipation of extension that never happened, accessors that are never once accessed.
Often it's history. One of my "favourite" recent examples is the X11 XRender extension. On one hand it was modernising X. Allowing a much more modern rendering pipeline. On the other hand it insisted on holding on to a world that has moved on:
On one hand, it provides a number of pre-defined visuals and formats, requiring servers that implements XRender to provide ARGB32, RGB24, A8, A4, and A1 visuals. On the other hand, it 1) allows the server to provide a list of every other kind it can support, at every depth it can support, 2) doesn't label the standard, required formats in any way. So as a result you get back a huge list of visuals at depths you don't care about, and formats you don't care about, and never will.
As a result, 1) the client library goes through a pointless matching exercise to go through a bunch of visuals and formats that it's highly likely in many cases has never once in the history of the XRender extension been used by anyone for anything but testing. Are you going to do complex alphablending and compositing on a machine with an 8 bit display with a palette? No.
The very, very generic system of visuals and formats X11 supports made sense in the 1980's. Even in the 1990's. It was marginally useful to support legacy hardware into the 2000's (but then we're talking maybe supporting 16 or even 15 bit depth graphics cards, not monochrome or 8 bit).
It's not necessary any longer. If you're going to run hardware where this is an issue, you'll be running old software too. But here it still is, contributing a bunch of pointless code on the server, and forcing the client to implement a bunch of pointless code as well because the protocol just assumes people will care about more precise matching (we don't; I just want to use the standard visuals and formats).
A lot of the time, with the full picture then, you can yank out a whole lot of code and reduce the number of edge cases by offering fewer choices, and you can often - not always - do so without sacrificing functionality that is actually used.
In a converse reply, Microsoft has done very well financially by holding on to backwards compatibility.
In the OSS world we tend to look at rewrites as "This is for me, who gives a shit about the customer", but most customer (paying) facing software there are the expectations of Do not break the application, and do not break the customers expectations.
My point isn't so much that ditching the backwards compatibility is always right, even when it lets you shed lots of code, but that you can often massively reduce code size that way. Whether that's the right thing to do is often a fine balance, because it depends a great deal on how many people actually care about the old ways, and sometimes people will get it very wrong in either direction.
Sometimes the 10x more code is actually worth it. But you should be aware when it is 10x more code, and make sure it is worth it.
You can hold on to compatibility by providing a simple 1bit enabled api from the 70s, 8bit linear mode with palette with scrolling and palette oriented api, then a 3d api on argb 32bits. 3 simple api may be simpler to maintain than 1 unified api.
Ah, but retrospectively what matters more is instead "now that we know how bad that got, how can we use what we learnt to do better when we rewrite". It's not a given that it's a bad thing to allow customers to drive rapid iteration even when it leads to a mess, as long as you're prepared to take the consequences. A lot of the worst systems I've seen were systems where people resisted rewrites rather than plan for them, often because they didn't establish enough tests, and/or were upset over the sunk costs. Treating the first iteration or iterations as throwaway learning experiences can be useful. If you plan for that from the start you can allow yourself to take shortcuts you otherwise wouldn't (but you sure as hell better make sure management understands sticking to that throwaway version isn't an option), knowing the lessons learnt will feed straight into the next iteration (and of course building a throwaway system doesn't mean every part needs to be written to be ditched).
I think the “never look back” mindset of product growth is a major driver of code bloat.
Rarely are engineering teams allowed time to go back and refine and optimize legacy code, until it’s a blazing dumpster fire… and even then.
Products never seem to have enough features, so engineering teams plow forward. When a component absolutely needs a rewrite, the rewrite is often implemented with the same haste as the original.
Given the time, I think every engineer could easily factor out vast swaths of code and refine what’s left to be much leaner.
I feel like the same goes for programming languages and platforms. Most devs would prefer to spend their time adding new capabilities rather than optimizing a few lines of code out of existing ones.
But then again “Perfect is the enemy of good” and “If it ain’t broke don’t fix it” and “We’ve got 731 items on the roadmap, so let’s go!”
I feel that most 100K line programs could be rewritten with just 10K lines and end up being more reliable.
Feature creep is responsible for some of the code bloat but I can guarantee from experience that, in the vast majority of projects, you could keep all the features and still cut the code to at least 1/10th of its size.
I think the reason for this is because developers who focus on development speed do so at the expense of succinctness. The more foresight you have when you're writing code, the fewer lines you will end up with. Unfortunately, developing that foresight requires time spent not coding; it means choosing the best option out of all viable alternatives. When developers are pressed for time, even if LOC is not used as a metric to judge them, they will not have the time to look ahead in the near future to minimize the lines of code.
Workarounds tend to require a lot of lines. When code is rushed, it ends up getting littered with workarounds which require additional checks, additional tests, etc... A bad foundation with sub-optimal abstractions can force developers' hands and lead to even more bad code being produced on top.
Before I start working on a feature, I simulate how it's going to work in my head and try to identify all the hurdles and alternatives; sometimes several levels down in the hypothetical component/module hierarchy. I do brainstorms, draw diagrams and make lists of pros-and-cons. I use as many visual aids as I can get. I play devil's advocate with my own ideas until I'm at my mental limit and I cannot visualize the solution (and requirements) in any more detail and cannot identify any other hurdles. It actually feels like playing chess. You need a strong understanding of your tools and environment to be able to do do this kind of adversarial brainstorming and most importantly, you need time. Especially in the early stages of the project. The further along you are in the project, the less foresight you need.
My rule of thumb is that most devs can only really be “responsible” for around 25k lines of code (SLOC) at a time. Almost to a man, everyone who disagrees with me by more than a few percent turns out to be cranking out buggy code. If you’re not on top of the bugs you’ve diluted responsibility. Your affect on the bus number situation is compromised.
The 25k axiom is why I’m always trying to prioritize between concision and readability. It’s easier to hand off readable code, it’s easier to manage concise code. Concise code that is not readable is terse, so there’s a balancing act to be managed there. I won’t say they’re at odds, but they can complicate each other.
First, I think we should use expression count instead of pure LOC because many styles add white space but keep the expression the same. I don’t consider one style “more terse” than another. e.g.
If you can accept that expressions are a better metric for “terseness” then I will categorically say your statement is pretty easy to disprove. Essentially you’re saying that every 10 expressions can be rewritten as a single expression. I don’t believe that to be true, based on a cursory glance of some code bases I’m familiar with.
Most any metric can be gamed. At large, though, I'd wager that the noise from these scattered through a codebase are minimal to the point.
That said, I do think I agree that tooling is good enough now that you can probably try both counts and see what can be seen. With the idea that these are not precise numbers into complexity, but directional concerns.
The spirit of the claim is to reduce the number of expressions by 10x while only sacrificing the “unnecessarily complicated” functionality. It is a question of system design that demands judgement, and not just a raw code length compression.
It’s about preventing large (and poorly thought out) projects from being worked on in the first place, before they need to be scrapped or circumvented.
Sure they could but we don't work in a vacuum where succinct code is the product, the business need is the product, with that comes a lot of complexity that only increases exponentially with the size of the organisation.
I can avoid to rush code when working in something internal to my team, the moment I need to work on something that is part of a much larger project then all the issues with synchronisation/coordination crop up, and we don't have the time to spare anymore to go through a 30+ teams/5 orgs interdependencies and play chess with all the permutations of interactions between all systems to arrive at an optimal solution which requires the least code possible. I can work to approximate to that as much as possible but it's simply impossible in a large project spanning many teams, systems and organisations inside the company.
At my job we do use RFCs a lot to do pros/cons analysis, play devil's advocate with each other, communicate with others our ideas and check if the proposals make sense in a larger scope, still business needs trump everything else. If my proposal is "it will take 6 months to do this properly" vs "we can make it work in 2 months if we adapt the existing thing to do thing + A" then the shorter timeline will always be chosen, and as a SWE it's my job to deliver the business value, not to write the perfect code I'd like to.
There's never enough time to write code as we wish we could, it sucks, I'd much rather not have to implement hacks that I know will create future pain to myself and to others but I'm not paid to do that, what is under my control is to try to create as little future pain as possible, documenting all the pitfalls, shortcomings, and issues I can already foresee when a hack-ish feature is implemented.
Yes, most experienced/smart SWEs will know that a lot of software could be written with much less code, when you are experienced you also know that it's just a fantasy, reality constraints shape software and the shape it takes is of a monstrosity, in the end it's just a spin-off variation of Conway's Law: we ship code that reflects business constraints and timelines, not engineering ones.
If you removed from a complex program any feature that was used by less than 2% of the users the program would be much smaller and simpler. However you also lose each 2%. In many cases every user uses a different subset of what your program does, and so the end result is no users at all because your program is useless.
Yes, but part of the argument was that you can often cut drastically without cutting features when you take the time to understand the problem properly. Sometimes everyone actually does use genuinely different features, but more often there are different ways of solving the problem that will still be more concise even if you keep everything.
To take a somewhat concrete problem from a past job: We had an agency do a bunch of work on features I didn't have time to work on. Being an agency used to be brought in to add a feature here and a feature there, they worked in a way that allowed for fast individual features, but that slowed us down overall and bloated the code when we had them in to do a bunch of work:
They'd manually write each screen. We had many dozens of models that needed CRUD stuff. They did perfectly fine work, and had we only wanted to expose a handful of classes, I'd let them do that.
When I got the time to review what they were doing and realised how much near-duplication they caused, I instead wrote a piece of code that introspected the database model, layered on annotations from our models including access control and additional type information, and spit out a bunch of JSON the front-end consumed to produce a generic CRUD interface to all of the tables. They first objected that there were too many things that were different between each of the screens. They were right there were many differences, but there were more similarities, and we could easily accommodate allowing them to override that.
They went from building new screens for everything, to picking a generic, automatically generated screen that was sub-optimal, configuring views, and writing new components to view various types in different ways. Each new component often made it trivial to make multiple screens better with minimal effort.
We didn't remove a single feature. They could still override the views whenever necessary with custom code. In fact, we added access to dozens of models that people had to ask someone to run SQL queries to access before, so the overall system is far more feature-ful. But the average amount of code needed per model is a small fraction of what it was before.
This kind of thing is common. People keep writing boilerplate or writing to abstractions that cause them to write far more code than necessary without stopping to think about how to simplify, or perhaps more often without having the power to decide to do something about it.
A lot of the time the simplifications also aren't obvious until you've spent some time being verbose and recognising the patterns that will allow you to be concise without sacrificing functionality.
"Simplicity follows complexity, not the other way around" (from "Perlisisms").
It's very hard to understand what comprises the small and elegant set of abstractions to solve a particular problem without first trying many larger sets, and checking how well do they solve the problem in practice. Unix itself was born after an attempt to solve a similar set of problems by writing Multics, a much larger OS.
It had warts, and I'm vaguely tempted to make another, cleaner, attempt at it with the assorted lessons learned - I don't work there any more and the parent company has ditched all the code, and so while I can't release that code, writing a new version from scratch wouldn't compete with anything they do. Not top of my list at the moment, though.
For it to make sense, it'd need to be the full package, with UI components. The nice thing was being able to annotate the model and get a "good enough for an MVP" CRUD ui out of it that'd significantly beat a generic database browser; not just the API.
Unfortunately that's also accordingly more work. I need a project that'll actually need it first...
Yeah, but do you really think any of that applies to things like the Linux kernel? You think with thousands of talented developers they have problems as simple as duplicate code?
More developers tends to make duplicated code more likely, not less, as people tend to work on percentage wise smaller subsets of the whole and fewer developers will have a semblance of an overview of the whole. Sometimes also even makes duplication sensible for a while if it allows shortcircuiting communications paths.
The vast majority of kernel code is drivers. That's another reason why the Linux kernel code would be far bigger even if optimal. But it's also why it won't be optimal - the effort to figure out all the shared elements of every device out there is 1) not worth it, 2) going to take someone a lot of time to actually figure out commonalities that may not always be obvious.
I've worked on a driver for the Linux kernel. We didn't even start to look at deduping it because the hardware in question would only ever have our device, and nobody else would have our device. But would there be duplication? Sure. The chip we used was common. For others it might be worth figuring out commonalities with other devices and eventually pare down the code. Or not.
absolutely. Any project with more than a couple dozen devs will have duplicated functionality (if not directly duplicated code). When you combine that with the loose coordination, I wouldn't be surprised if 20% of functionality in the linux kernel was duplicated.
The linux kernal has been very careful to deduplicate. Maybe you can find duplication between the scheduler and wifi drivers, but in general each subsystem does deduplicate.
We rarely think of outright textual duplication (though there's plenty of that in Linux too - first file I checked I found multi-line segments of repeated code, though there's nothing wrong with that when it leads to simpler code).
The more insidious duplication is the one that looks reasonable because it involves e.g. different filesystems, or other capabilities where it technically provides additional features, but practically doesn't. Clearing that out from an established project is near impossible because often there is someone out there who cares even though it'd make little practical difference to them.
E.g. the ext2fs driver is still in Linux. "Technically" it offers extra functionality: You can use ext2fs without booting an old kernel in a vm. In practical terms, for a system starting from scratch, on the other hand, it offers no meaningful increase in capabilities.
A not insignificant portion of Linux code is code like that which is there because someone cared at one point, and there's a legacy, and there's little real benefit to do more than not build a given driver by default.
The core, non-driver parts of Linux itself is "cleaner" in that respect, but that too carries along legacy where it becomes a philosophical question whether removing a given thing strips functionality or not (e.g. the system might be able to do the same thing, but not in the exact same way)
May I venture that you're probably early in your career?
There's almost always things that can be redesigned to be better and smaller if one has a better understanding of total scope from the beginning, but there's equally as much discovery that the reason things seemed unnecessarily complicated was a lack of understanding of the complexity, and that the new rewrite eventually reintroduces much of it as it's used in production.
this isn't always true. more than once I've taken a large codebase, whacked it down to 10% of its original size, without losing any features and gaining quite a bit of performance.
smallest-change maintenance by lots of people just introduces cruft by its nature.
not suggesting that doing that rewrite is usually a good idea...but I disagree that all that stuff always represents anything fundamental
IME, long-term multi-author software projects tend to accumulate cruft in a way that doesn't require drastic re-writes (as opposed to, e.g., an architectural mismatch which if solved would yield the desired 10x improvement but poses a variety of huge risks, both in actually completing it and how bad things are if you fail).
1. With each feature request, pick one thing to improve (a bug, tests testing mocks instead of code, duplicate classes, inconsistent error handling strategies, complicated logic, ....).
2. First make the improvement. Propagate the beneficial impacts throughout the codebase (remove methods that only existed to support the mock, remove utility methods that only existed to support those, remove tests for the utility methods, remove duplicate tests on the previously duplicated classes, remove code duplication elsewhere that bifurcated due to the duplicated classes, change your Result<T> return type to just a T type because your sanity check was wrong and the method can't actually fail, you no longer need to pattern match (or catch exceptions) on that result type because failure isn't an option, ....).
3. Then implement the feature. The nearby code was just improved, so this is a bit easier than it would have been.
4. Repeat ad infinitum.
After doing this consistently for a little while, the 10x reduction in code happens on its own, and it's faster to implement new features _and_ fix the little bugaboos than it was to just implement a feature starting out. Your code is more stable, your builds are faster, your code is faster, your tests are faster, your tests catch real bugs, feature velocity goes up, you don't accidentally expose a race condition in your driver because of dumb concurrent complexity in the application, and on and on.
Then, if an architectural mismatch exists, the code is in an understandable state. It's leaps and bounds easier to re-write a 10k project than a 100k project.
YMMV. Not all code is that bad, but a significant fraction of older projects turn out that way eventually.
I agree 150%. I often find myself, after finishing a "big" application, thinking "well, this must have been thousands of lines". Then I run sloccount or wc and I'm always surprised with how few lines of code I have. The thing is, as you say, I spend more time thinking than writing.
> Before I start working on a feature, I simulate how it's going to work in my head and try to identify all the hurdles and alternatives; sometimes several levels down in the hypothetical component/module hierarchy. I do brainstorms, draw diagrams and make lists of pros-and-cons. I use as many visual aids as I can get.
Remarkably, this is how 37Signal's Shape Up (https://basecamp.com/shapeup) encourages defining feature work prior to building.
In particular, the concepts of Rabbit Hole (as explored with senior developers prior to coding), Breadboarding and Fat Marker Sketches (having a high-level but end-to-end map of the feature) are almost identical to what you're describing.
I found this approach both intuitive in my personal development work, and as a tech lead for lean teams. Funnily, quite a few people really struggle with the concept of "thinking through the feature end-to-end", and not just "let's start with one piece and then figure it out". It's great to do development in small chunks with unknowns, but we still need to know what we are all trying to achieve!
(not affiliated with Shape Up / Basecamp, I just feel Jira leads to hugely suboptimal and waterfall-y processes).
> you could keep all the features and still cut the code to at least 1/10th of its size.
I do not disagree. I feel like, in some cases that might even be the minimum reduction. In some cases it might be more like 1/20th.
But does that cut require twice as many man-hours, or does it require x40 man-hours? More? Whatever the actual answer, I do not think it cheap. I don't even know how to guess how much added effort and time it requires.
In many cases and companies, it might even require hiring highly sought-after engineers... in-house talent simply isn't up to it.
The problem is that reality is complex, lots of special cases. Dates are a well known example, with lead days, leap seconds, time zones, DST, etc... So is localization.
You may have the perfect code for your use case, with the right level of abstraction, until reality kicks in. For example, let's say you are writing a device driver, each device identifies itself with a unique identifiers, fine, until one day, you get different devices with the same id and you need to issue a special command to differentiate between them. And then, the next device receive commands in little endian order, even though all the other ones were big endian. And then another device sometimes silently fails to execute a command, you then need to a bit of time and check that it was actually taken into account and retry if it wasn't. Etc...
You can't plan for all that, if you try, it will only make things worse. All that will invariably result in ugly code, it is not bad design, it is just that reality itself is ugly.
If you try to start over, if you are really really good, you can reduce it to something nice until the next dose of reality. But more likely, you will make things even worse. Believe me, if you think you are good, try rewriting a legacy app that is used in production in a way that doesn't break production, I predict a humbling experience.
> I think the reason for this is because developers who focus on development speed do so at the expense of succinctness. The more foresight you have when you're writing code, the fewer lines you will end up with.
Not necessarily.
1. The process you describe works pretty well for code you wrote yourself. However, once the original developers are gone, any understanding, foresight and future plans they had (beyond the coarsest, most high level ones) basically got burned in a fire. That means a lot of minimum-effort jiu-jitsu solutions aren't available anymore, and you'll have a lot more kludges and re-implementations of things that may already be there (but forgotten).
2. Few developers have the luxury of infinite time, and succinct code is rarely a business priority. That means you pretty much never can go all-in for succinctness, which would frequently require large scale refactors to achieve.
The end result is that, inevitably, over time, features and bug fixes will get grafted onto foundations that weren't ever meant to support them.
Answer: smaller target to cover! Modern Linux has most code in device drivers to support so many different devices. Then, it supports many targets for other subsystems, like file systems.
The original Unix provided one implementation for each subsystem. They relentlessly simplified the problem they were solving to make it doable.
But modern Linux is only the kernel. Early unixes included the userland as well which creates a different balance in types of code.
In fact traditional unixes still do, like FreeBSD. Linux is the odd one out with this separation. I think it happened because GNU was not very successful with Hurd but they made great userland so "Linux" became kinda a combo. And for Linus the userland was never really in scope anyway.
They also didn't care nearly as much about performance.
Or to be more precise, they had very different trade-offs to make. Back then, you could have a system call and context switch for every read and live with the overhead.
Back then, fitting in RAM was a big problem: The PDP-11 (the first kind of computer to run Unix, as opposed to UNICS) had a 16-bit address space, which gives you 64 K of RAM if you ignore the fact peripherals were memory-mapped and so took addresses away from actual memory. Later models had split I+D, or separate address spaces for Instructions and Data, but that's still only 128 K.
For sure, and up until the mid-to-late 80s RAM speeds were higher than processor speeds. So approaches to architecture were totally different.
Most minicomputer class processors were sewn together from multiple chips and transistors; even their register sets for the CPU were often not dissimilar from main memory. Texas Instrument's minicomputer (and later microcomputer) architecture even just put registers in RAM. In the microcomputer world, the 6502 got around having a very small register set by just having a 256-byte "zero page" with slightly faster cycle access than regular memory.
There was no need for complicated cache hierarchies. Relative cost of a context switch or interrupt or transition from user space to kernel etc. was way lower than now.
And the users of the system were by and large trusted. Security was more of a suggestion.
I'm thinking most stuff was designed around single core performance. Lots of drivers had big'ole gigantic locks which really killed multi processor performance. These days with modern CPUs with a ton of cores the complexity has increased greatly.
We did care about performance because 1 MHz was slow even back then and data bus were slow too. However we had to care more about memory size because we couldn't fit much in a few kB or 1 or 2 MB for huge machines. Finally, compilation time for substantial programs were... substantial! You could play games as in that sword fighting xkcd.
Many years ago the team I worked out got some shiny new Sun 4 machines and we soon ran out of disk space - solution was to plug a cabinet full of a few old huge disks that I think came from some kind of ancient mini into a Sun 3 and then NFS the new storage onto the Sun 4s.
It was truly bizarre to watch the cabinet judder and shake as we accessed stuff - I think there were only 2 disks in it and it was mostly full!
> the PDP-11 disk unit throwing itself around violently when somebody started nroff.
That was the overlays… the early days substitute for the virtual memory proper. If I remember correctly, the first PDP-11 to have the true virtual memory and a MMU was PDP-11/70 which had 18-bit wide hardware addresses.
You are correct. I have long forgotten about the mapped memory and was thinking about Unibus device registers having long 18-bit addresses. The 11/70 also had separate instruction and data spaces which did not exist in earlier models.
If you’ve read any of the code you’ll also know that early Unix was full of security vulnerabilities. Eg. Statically allocating fixed buffers and not checking input sizes.
I’m all for appreciating simplicity, but let’s not pretend we haven’t progressed since then.
Not only early Unix; look at the original inet_addr implementation [0]. It accepts not only "0x" but also just "x" as the 16-base prefix, it doesn't really care about the numbers overflowing, and it parses 09 as equal to 011 (which is decimal 9). And the less said about the coding style, the better.
There were also a lot of practical limitations on the hardware that need to be recognized. These weren't devices that could store megabytes of data for code or memory. Not only that, but the compilers were also a lot dumber (by necessity). So, optimizations you'd normally leave up to the compiler (like inlining) you instead did by hand.
This part is frequently lost on people. Bell labs developed a multi-user operating system that supported multiple people logged in at the same time on a machine with 64 kilowords (144kB) of storage.
Later development was done on a machine that supported a max of 4MB of memory and had to allow for hundreds of simultaneous logins. Keeping the code compact was a high priority, even over usability in some cases.
I disagree that it’s a human or subjective factor as others imply. Or at least to me it’s a secondary contributor.
Back then, the hardware and peripherals were so much simpler. There was no graphical output for the original PDP where Unix was initially developed. Not even a terminal. There was no networking either.
The features of the system were also rather basic (to us). And security wasn’t even a thing they thought about. Some code practices are what we now consider to be terrible, optimizing for the limitations of that time.
So all in all, things have gotten so much more complex since then, and the size and complexity of the code has grown accordingly.
> Some code practices are what we now consider to be terrible, optimizing for the limitations of that time.
This. If you went back to 1972 when Dennis Ritchie was working on C and said "String literals should have an extra machine word for their length and strings should have yet another machine word for the capacity of their buffer" you'd be considered a moron for wasting so much memory.
Not really. C was an iconoclast even at the time. Pascal was the en vogue language of the moment, and it used a length-prefixed string format.
But sure: it's true that in (a half century of!) hindsight, C strings were probably a mistake. But don't sell null-terminated strings short either. C could play tricks that Pascal couldn't. Iterating over the characters of a string has a natural expression using the same compiler features as arrays. A pointer into the middle of a string was still a valid string. You could lex a string inline during parsing by adding termination to the end markers or whitespace, etc...
So you just don't store the array's length near the arrays beginning but instead inside the slice-typed variable which lives somewhere else entirely. Boom, you got the best of the both worlds: trivial slicing and reliable bounds checking.
> instead inside the slice-typed variable which lives somewhere else entirely.
OK... where? Now you have a complicated heap-like semantic inside your compiler internals. But not everyone wants their string metadata in the heap. So now you need allocator semantics a-la C++, which ultimately leads to move semantics, etc...
Good luck getting that done in 1972. No, DMR was right, Pascal was wrong, and fancy modern string semantics were still two decades in the future. C was the best it could be given the constraints of the era.
In the automatic storage, duh, where everybody else puts them today.
> But not everyone wants their string metadata in the heap.
Exactly, so you don't put them there, which is what I am proposing: instead of putting the string length before, or the null delimiter after, the string itself, you put it near the pointer that points to the beginning of the (sub)string.
> you put it near the pointer that points to the beginning of the (sub)string.
And now you have a "string pointer" which is distinct from a "data pointer". You can't allocate a heap block and "put a string in it" because the special thing at the start needs to go with the pointer and not the data in the block. And your 1970's compiler on your PDP-11 with 48kb of RAM needs to manage that. Good luck to you.
Again, we're not answering the question "Can a better string implementation than C be designed?". Clearly the answer is yes. Go pick your favorite, there are literally dozens. We're asking "What else should Richie have done with C that would have been better?". And... it's not what you're suggesting for sure.
> And now you have a "string pointer" which is distinct from a "data pointer". You can't allocate a heap block and "put a string in it" because the special thing at the start needs to go with the pointer and not the data in the block.
The second sentence doesn't follow from the first one.
struct string {
char (*ptr)[static len];
size_t len;
};
struct string new_copy = { .ptr = malloc(100), .len = 100; };
new_copy = strcpy(new_copy, "some other string"); // string literal is a syntactic sugar for statically allocated 'struct string' with 'ptr' set to an unnamed, statically allocated char[] buffer inside it, and proper 'len' field.
struct string strcpy(struct string dest, struct string src) {
struct string result = { .ptr = dest.ptr, .len = min(dest.len, src.len) };
memcpy(result.ptr, src.ptr, result.len);
return result;
}
That's C, though. You're writing C. DMR gave you that, in 1972. (Actually not, because you had to wait for structs, but you know what I mean).
But what you seem to be asking for is support in the language for strings that work like this. And that causes problems because of that special handle you've invented. Now strings aren't arrays anymore, they can't have pointers taken to them, they can't have substrings in a natural way, they can't live in ordinary POD memory as a unified thing.
All of which is totally solvable in the runtime of a language written in 2003 or whenenver. But not on a PDP-11.
But C strings could at least be arbitrarily long, Pascal strings were extremely handicapped. It’s like they understood why length prefix is better, but then picked the worst possible implementation.
screens were generally 80 characters wide 256 was good enough, if you needed to go bigger null terminated could still be done in pascal, you just had to roll your own.
The editors I wrote back in the eighties were just linked lists of pascal strings so a line was limited to 256 characters but these were very fast in VMS and DOS
Also we only had a few character sets and you could live your whole life in ASCII
And worse for pascal, a 256 byte limit on the length and api shenanigans so that not every string allocated needed to consume all that memory if the string itself was shorter.
Back then "runtime strings" — as in, temporary heap allocations that were allocated to hold onto a bit of text separately from the buffer it originated from — were extremely rare.
Tools that manipulated text in Unix C, weren't copying strings out of buffers onto the heap, holding onto them in data structures, and then later passing them piecewise to write(2). Instead, they were read(2)ing fixed-sized chunks of text from STDIN to a data-section-preallocated input buffer; running a resumable stream-processing state-machine (e.g. a lexer) over that input buffer to feed a transformation step; potentially building a result into a (again data-section-preallocated) output buffer; and then emitting either directly from the transformation step, or from the output buffer, using write(2).
This was the real "innovation" of Unix: you don't need so much memory, or so much copying, if your tools can work in terms of streams of characters. You just need two static arrays and four pointers per process, plus whatever per-line book-keeping state your state-machine uses. It's like each Unix tool is a virtual implementation of a hardware DSP!
(And this is also why so many Unix tools are so weird. tr(1), for example, would never have arisen as the precisely-designed solution to anyone's particular problem; but tr(1) fits in perfectly as an "obvious" primitive in a toolkit of command-line DSPs.)
Note that NUL-delimiting "strings" makes perfect sense when most "strings" aren't runtime strings, but rather are "pieces of text living inside a large static char-array buffer they were either directly read(2) into, or strcpy(2)ed into." The NUL isn't supposed to tell you the end of the buffer; the NUL is supposed to tell you where that individual string ends within the buffer — and so where you'll then find either the next string, or garbage. Nobody was calling strlen(2) + malloc(2) + strcpy(2). char-pointers mainly existed to be the C equivalent of Go slices — i.e. to keep track of some text living inside a larger buffer. If you ever malloc(2) or free(2) anything, it's the containing fixed-sized buffer, not the string!
I'm not sure if many Unix tool implementations (e.g. GNU coreutils) these days retain this architectural philosophy; but you can still clearly see the remnants of it in the places where Unixisms became ossified parts of the C ABI. For example, while ARGV and ENVP are these days developer-visible as char[][], the C ABI requires these to be backed by a pair of contiguous NUL-delimited and double-NUL-terminated buffers that get fed into the process's address space by exec(2). In original Unix C, it's these raw buffers themselves that were simply passed directly to `main`; and it was the developer's job to parse flags and env-vars out of them, if they wished — not by pre-strcpy(2)ing data out, but rather by simply running a flags-parser directly over the entire buffer, reacting to each flag as it was observed within the buffer. (And in fact, on many Unixes, getenv(2) is still implemented as a streaming search through the underlying char-array buffer, because it's simply more CPU-efficient and cache-coherent than repeatedly indirecting through the developer-visible array-of-pointers.)
Heh, even error handling these days. We expect our applications to give reasonable feedback and debugging on what when wrong without finding a core file on a server somewhere.
I'll take a message "hey dummy, your configuration is wrong here" rather than SIGBUS any day of the week.
Joel Spolsky has a terrific article about where all the extra lines come from and why mature codebases tend not to feel as clean.
“ Yes, I know, it’s just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I’ll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn’t have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.”
It was essentially a "time-sharing" system that multiplexes interactive sessions of multiple users on teletype terminals. The "shell" handles such a session by waiting for user input on the terminal, interpreting it and starting external programs as a sub-hierarchy of user processes. On a teletype terminal, you couldn't even have a cursor-oriented editor like vi, but had to rely on line-oriented editors like ed and sed.
In a way it became the complete opposite of how it started. At first one OS for many users, ea with many processes. Now, with containers, micro services etc. we have an OS per service/process. Still the original abstractions work surprisingly well though makes it me wonder how a complete redesign of would look like aimed at modern usage.
But the question is why did we arrive at containers and one OS per "microservice"? Has memory-to-IO bandwidth, scalability requirements, or whatever really changed (like in orders of magnitude) to warrant always-async programming models, even though these measurably destroy process isolation and worsen developer efficiency? After almost 50 years of progress? Or is it the case that containers are more convenient for cloud providers, selling more containers is more profitable, inventing new async server runtimes is more fun and/or the regular Linux userspace (shared lib loading) is royally foobar'd, or at least cloud providers tell us it is?
The traditional Unix IO model broke with the Berkeley sockets API introduced in 1982. The obvious "Unix" way to handle a TCP connection concurrently was to fork() a separate process servicing the connection synchronously, but that doesn't scale well with many connections. Then they introduced non-blocking sockets and select(), then poll(), and now Linux has its own epoll. All these "async programming models" are ultimately based on that system API.
>But the question is why did we arrive at containers and one OS per "microservice"?
I think it makes more sense if you consider the interim transition to other isolation mechanisms like commodity servers instead of mainframes, VMs, then containers as a way to get more isolation/security than traditional multi user model with less overhead than an entire machine.
Obviously cloud providers want to push for solutions that offer higher densities but those same cost/efficiency incentives exist outside cloud providers.
I'd say we've more accurately been trying to reinvent proprietary mainframes on commodity hardware.
Calling an independent set of libraries in an isolated space an entire OS is a bit of a stretch. Containers generally don't contain an init system and a bunch of services (sure, they technically can and some do), but there's generally much less running than an entire OS.
I'm not sure I think the exokernel/unikernel approach by itself is the path forward. While the library operating system approach makes a lot of sense for applications where raw throughput/performance are crucial, they don't offer as much in the way of development luxuries, stability, or security as most modern operating systems. Furthermore, outside of very specific applications the bare metal kind of performance that the exokernel approach promises isn't really that useful. That said, I suspect a hybrid approach may be viable where two extremes are offered, an extremely well isolated and secure microkernel which offers all of the luxuries of a modern operating system built on top of an exokernel which can also be accessed directly through the library operating system approach for specific performance critical applications (say, network and disk operations for a server.)
There's been research into using Linux as the basis for having a unikernel for performance-critical workloads (e.g. database) while retaining all the development /performance-optimization/etc. tooling for both developing the unikernel and for all the other workloads. Of course that doesn't give you a small OS codebase but it does let you optimize for specific workloads.
As mentioned in one of the comments in those threads, a lot of the code in an OS is device drivers, and I'd say that device drivers in the PDP-11 were really simple. You didn't have the multiple layers of serial bus such as PCIe, USB, etc. There was no multiprocessor support, very simple filesystems, probably less than 10% of the POSIX API we know and swear at today was implemented.
Simple I/O was one reason why DEC's minicomputers held their own against the IBM 360, it was easy to make your own Unibus peripherals, whereas peripherals for the 360 required a highly complex "channel" processor; bulk I/O was crazy fast on the 360 but interrupt handling was slow: the PDP-11 could handle individual keystrokes from a serial terminal whereas the 360 would wait until you'd filled out a whole screen of data and hit the SEND key.
I never used Unix on a PDP-11 but I did use RSTS/E and OS of that generation did not dynamically find hardware, instead you would have to rebuild an OS image with the configuration for your setup built in. Thus the PDP-11 expected a control terminal and bootstrap device to be installed on certain ports and you would load some software to build an OS image that you would actually boot up. (Also true of the 360)
Before Linux it was common in an OS class to write a boot loader, that was about all you could do in a semester. After Linux you can write a few kernel modules and learn a lot more. If you tried to write a boot loader today it might be an assignment for two semesters for the whole class because today's boot loaders are crazy complex to what they were in the day but you do get some functionality (think of what it takes to be able to boot a live linux distro off a USB stick that has a FAT file system off it.)
In large part what a modern Unix-like operating system is... is a kind of emulator that makes the very insanely complicated and powerful machines of today look as simple and understandable as a PDP/11. I think of the Linux kernel (in conjunction with some complicated hardware support) as a giant behemoth of code there to maintain the illusion of a set of relatively understandable clean abstractions: memory, tty, display, network, process, user, file system. None of those things are nearly as simple as the mental model we maintain of them, but the OS makes it look that way.
They're actually insanely complicated underneath because of real world reasons. The code in the kernel is in large part there to hide that.
That code wasn't needed in early operating systems because the underlying hardware system actually was that simple.
... But with hypervisors on the scene now, and virtualization the common way to run applications in the "cloud", it's now entirely conceivable that the "unikernel" type approach could create bespoke software systems out of only the pieces needed for a given application or service on a given platform, and ditch the rest. The win is potentially a more holistically understandable system, in addition to the potential for targeted optimization.
From what I've read I get the impression that for a while in the 1960s and 1970s, the terms "process" and "virtual machine" were interchangeable. I don't feel that was a mistake, and today we make an arbitrary excessive distinction between them. They are the same concept, just implemented in a slightly different way.
From both the program's and the programmer's perspective, a process under a Unix-like is a virtual machine. A very nice virtual machine, too. It has approximately infinite memory, can operate in parallel with an arbitrary number of parallel machines, and the kernel API can be thought of as an enriched and very sophisticated instruction set.
I think mindset plays a role as well. Back in the day, there was a restraint on adding features, at least when they would make the logic more complicated for only a tiny or imagined benefit. Part of that was memory/storage limitations. But I think the pursuit of an abstract idea of elegance was equally important. After all, these systems were primarily written for other computer people.
This idea is completely gone from the world of software engineering. Systems become overloaded with the implementation of every crappy idea under the sun -- if a PM in a fever dream can think of it, and an overworked dev can hack it together, it goes in.
In the case of Unix, it was deliberately a KISS/stripped down successor to Multics. The simplicity was originally intended for faster development and running on minicomputers rather than mainframes. In the end the simplicity also meant hackability and portability which is probably why it was successful and has had so much staying power.
Not in open source. Projects with the attitude of allowing any feature that some rando comes up with die quickly when the manpower to keep it up simply isn't there. The projects with maintainers that gatekeep are the ones that survive, and they survive for a long time.
> This idea is completely gone from the world of software engineering. Systems become overloaded with the implementation of every crappy idea under the sun -- if a PM in a fever dream can think of it, and an overworked dev can hack it together, it goes in.
That's a very developer-centric point of view. Elegance and simplicity are great, but if a program is missing crucial features, it becomes less useful or even useless for users. Like it or not, but the vast majority of software is written for the benefit of its users, not that of its developers.
> the vast majority of software is written for the benefit of its users, not that of its developers.
No, the vast majority of software is written for the benefit of the company that's selling it. There's a very, very clear difference there. A lot of this business rests on advertising to people to convince them that the product is good, rather than actually creating a good product.
Have you actually worked a regular job and spoken to people who use software? The large majority of software have bugs that make the program unfit for purpose. The more widely used it is, the more of a monopoly the company producing the software has on the market, and in turn the more likely the product is to be a hot piece of shit.
A PM's fever dream is almost never a crucial feature. The PM may think it is, may say it is... but it's not.
I agree that software should be written for the benefit of users. But too much of software becomes baroque ornamentation that fills in check boxes on feature lists but is of little actual usefulness.
Especially the features we don't consider. Try replacing your "ls" command with the one found in the original Unix, sure it's 90% smaller, or more, but I guarantee that it doesn't do colors.
The reasons for GNU bloat are well documented, but instead you are arguing over uhh... colour? ls outputting colour or not? Isn't that the very definition of bikeshedding?
"Strictly necessary" is too strict a criterion. For me, ls without colors to distinguish between files and directories sucks ass, as would an ls which doesn't sort by name (also not "strictly necessary").
You sure about that. I was reading only yesterday (can't find the link - sorry) about how little each extra feature is used - that the vast majority of value & use is in the core.
I see the other way around. Additional features are people's special cases and hobby-horses, and unlikely to be used by other people. BUT each new feature/interface/etc adds the potential for (and really of) more bugs.
Perhaps it should be "features are where the security holes are"? :)
It's really simple - the early Unix kernel had so few lines of code because the machine had so few bytes of memory (180KB I think), and a bigger kernel wouldn't have fit. It's actually worse than that, because the PDP-11 address space was 16 bits, so the kernel had to be <64KB.
It really only had two types of devices - the disk and the serial ports. No network protocols, and a single very simple file system. Basically no virtual memory - it had segmentation, and most memory management was done by swapping processes to disk. More complexity would have to wait for bigger machines with more memory.
Unlike previous OSes, UNIX provided the pipe facility to allow you to combine the functionality of many small programs.
Ex. ls | grep ... | sort ... | more
Before that every command-line program had to support the full suite of additional functionality like search, filtering, paging, etc.
This UNIX/Pipe approach allowed for greater reuse and much less code.
This is explained and simulated in this video:
https://www.youtube.com/watch?v=3Ea3pkTCYx4
Error handling? What error handling? Corner cases, what corner cases?
The software industry still suffers greatly from the decisions made in UNIX. Only the SUID bit caused so much trouble. It’s crazy to this that we still have /usr/bin, /home and /bin only because the original UNIX machine had 3 disks in the system. Bazaar wins, I guess.
Not necessary applicable to this case, but from my experience it is more about performance. If you want a solution that works, the solution is usually quite simple. But if you want something that is fast, you often need to do platform specific things, and use fast paths for 'special cases' that occur more often than the 'general case'. These things together will make your code more complicated, but get better performance.
For example, memcpy can be implemented with a very simple loop, but platforms usually provide a much more complicated memcpy that uses vectorization and cache prefetching to get better performance. Compiler autovectorization can give you better performance, but it still cannot beat the hand-tuned memcpy in general.
Another example would be interpreters. You can write a very simple interpreter that works, but if you want better performance, you will need to throw in various techniques that may or may not be machine independent. And in this case, it is more about the algorithm and global invariant that the compiler cannot reason and optimize.
Note: I am not saying that all the code are necessary, I just argue that sometimes you need to sacrifice succinct code to get better performance, and this tradeoff is worthwhile for such a low level infrastructure that everyone depends on.
There was no complex stuff like SMP, no netfilter or similar. Software in and before the 90s was written with extreme naivety; buffers would be blindly allocated / assumed based on user (even network) supplied input. This makes code much simpler, even with bloated designs like UNIX. If you think about it, UNIX actually can't even be defined, you just have a way to execute programs, some process management (in this case which is overly defined for no reason, with superfluous concepts like process groups), the UNIX file system with a permission model that only needs a hundred lines of code, and output (and input) just goes to a terminal that interprets it however it wants. It has no say on anything so that's why you have 13 different programming languages that all do the same thing, dumping files all over the place trying not to stomp on each other, a shell language which itself is so poorly defined that people aren't even sure if comparing something to empty string is portable, no standard way to pass around data (just strings, whose encoding changes every 5 years), even the encoding of file names.
On the other hand, UNIX is bloated with garbage like `wall`, the ability to output text into another user's terminal, metadata all over the place with no discipline, like ps being able to see other users' command line including if they typed a password in it (security is a bonus but beside the point: you shouldn't expose data through an API that isn't explicitly needed for the purpose the API purports to serve). Stuff like hostnames, DNS, email, are built into the kernel in various ways. This could answer why it needs a whopping 13KLOC.
Early Unix switched contexts by writing the current process out to disk, loading the next runnable process, then jumping to it. Besides that overhead being far too slow in modern terms it also did not admit threads nor multiple concurrent cores. There was no need to consider atomic operations or even locks to a large degree.
Most people wouldn't appreciate minimalism if they had to use such software on a daily basis.
Any way to reasonably estimate how long it would take to compile back then? The PDP-11 was 1.25 MHz, Would it be roughly 4000x slower than compiling a 13K-line C program on a modern CPU? Or are computer architectures so much different that CPU speed isn't the primary factor?
Compilers vary greatly in their complexity. In particular, optimization is a truly open-ended task, which can consume however much CPU time and memory as you're willing to throw at it. GCC and LLVM today are much more sophisticated than anything around in the 1970s or 1980s. They will chew up the entire program, transform it into an enormous (many megabyte or gigabyte) graph abstractly representing the program, and then spend hundreds of billions of cycles, exploring possible rewrites of this graph, that make the program faster or more efficient.
In comparison, early compilers, including the first C compiler, usually worked on a single statement at a time, immediately translating it into equivalent machine code. It never examines the whole program together, as there wasn't enough RAM in those machines to do that. This produces inefficient code, but it's very fast to compile. In fact, accommodating the memory limits of machines like the PDP-11 influenced much of C's design. It's why everything has to be declared before use, for example. That way less state needs to be kept to resolve forward references.
People did not compile the kernel back then. The kernel was supplied in a precompiled form as a collection of object (.o) files, and there was a kernel «generation» shell script that would ask questions about which parts of the kernel were required (e.g. which device drivers to link in), what to set certain kernel parameters to (e.g. the number and the size of I/O buffers for the page cache had not been invented yet and neither had been «sysctl»), and – after jumping through the hooves of the very detailed and thorough interview with the kernel generation script – it would link the /unix kernel a.out image. It took several hours to get the final product.
The source code was distributed on a tape but it was not customary to compile the kernel from scratch every time.
It's interesting where peoples' heads are. A number of people talk about device drivers, but back then few OSes ran on more than one kind of machine, and hardware didn't change much. So there wasn't much separation between hardware and software: you wanted a block of data written to disk, well, the filesystem implemented the code to twiddle the disk hardware directly.
Unix was unusual in that it was written in a pretty high level language, an idea inherited from Multics. Most OSes were written in assembly.
Normally, the filesystem wouldn't directly twiddle the disk hardware; even on the PDP-11, you had a bewildering array of disk types: fixed head RC disks, single platter RL01/RL02, multi-platter RK01-RK07, RP disks that were a somewhat more advanced RK (but completely different interface), Massbus disks like the RM04, MSCP disks which were a very weird beast (the programming interface was not entirely unlike io_uring), etc. (to say nothing of the 4 different tape drive interfaces, two different floppy drives, and a mind-numbing array of serial interfaces) v6 unix already had support for most of these, and even the famously tiny RT-11 (which would happily run with 8k of core for the kernel) supported multiple different types of disks at the same time.
Yes but the question was about the original PDP-7 implementation of Unix, not the port/rewrite for a large volume (by the standards of the day) machine like the -11.
Same reason you can write a proof of concept in a weekend, but the production version can take years.
The web's best example is twitter. Anyone can write a twitter clone in a weekend. A reasonably competent programmer can create a multiuser twitter clone in under an hour.
But you can't write twitter in an hour.
It's amazing how much you can write when your requirements are much smaller.
Also if you write your Unix for exactly one hardware, then you don't need two serial port drivers. And therefore you don't even need a serial port driver abstraction API.
Writing the POC takes 90% of the time, writing the code to deal with the edge cases takes the other 90% of the time!
Or another one I like to say. It takes a few lines of code to deal with the right answer. A few lines code to deal with the wrong answer. And an absolutely massive amount of code to deal with Russell's Paradox.
The whole point of writing Unix was in response to bloated feature-cluttered OSs of the time, e.g. Multix. It was an effort in minimum feature set to cover all OS necessities.
Not sure specifically about early Unix but it seems system reliability and error recovery have improved quite a bit. Hardware devices and subsystems can crash and recover without the user noticing or only seeing a slight delay.
As an extreme example here[1] is an article by Brian Kernighan about a basic regexp matcher by Rob Pike. The code, with comments, is 35 lines of C.
Meanwhile, the regexp engine in Ruby 3.2.2, not even including the Regexp class visible to the language, is ~20597 lines excluding the headers.
They are not reasonably comparable, of course. Pike's code provides a very basic syntax which doesn't even support character classes, and does nothing to speed up the matching. The latter supports a very much more complex syntax, and is far faster for repeated or long matches.
But while Ruby's regexp engine is 588 times large than Pike's code, you of course get a vastly higher boost in overall system capability from those first 35 lines of code (from no regexp matching to some regexp matching) than the last 35 lines of code, or indeed the second 35 lines.
So if you have a small system and a small team, you work with that and start small and it won't be that surprising when you get a lot done in few lines of code, even though you have a long list of things you'll add when you can justify the extra resources (like those missing language classes, and a billion other features)
(Then you get a larger system, and you'll very soon find yourself wondering how you could manage with so little)
I think it's mostly surprising because most developers today aren't used to thinking about capacity constraints of small systems, and so starts designing for lots of features from the start (can't have regexps without character classes, and capture groups, and back-references, and ...).
[1] https://www.cs.princeton.edu/courses/archive/spr09/cos333/be...