Why do my comments seem combative? All I said was basically good job, but it will unfortunately not be used much. Why is that combative?
With regards to chroot, I stand corrected. I knew it was a tree of symlinks, but I thought it was also more than that because symlinks alone don't seem like a sandbox. Honestly, Cosmopolitan's system appears to be more of a sandbox than that.
I agree that a strong sandbox is going to require OS-specific support as of right now. I do have ideas for implementing sandboxes without it, but it requires putting the sandbox into the interpreter. And I could be completely wrong that the interpreter would do a good enough job.
And this is what I mean by interpreter: Bazel has a language. I think it's called Starlark. To make that language useful, it needs some interpreter. That intepreter might just be reading the language and building a dependency tree, but my point is that it could do more, including checks.
Perhaps my assertion that Bazel would not want to do it that way is not fair, but I said that because Bazel's method of sandboxing is different, and I suspect that they would not want to refactor their Starlark interpreter. That's all. They certainly could, and I would encourage them to. So I could be wrong, and I would eat my words in that case.
> but my point is that it could do more, including checks.
Could you explain more how you see this working?
For example: the build system is running a build step. It has determined the inputs and the outputs for that build step. It is going to execute a subprocess for that build step (say, GCC). It wants to ensure that GCC doesn't accidentally depend on files other than thaie that the build system knows about. How can that functionality be implemented with checks in the build system interpreter?
I suppose it could run the process with something strace-like and monitor which files it accesses but isn't that just a way of implementing a sandbox? I'm not sure what you mean exactly.
The best way to do this is best described in the thesis that Eelco Dolstra wrote describing Nix. I suggest you read that.
tl;dr: Clear the environment, know where all of the system headers are, control the build environment of the dependencies. Basically, knowing dependencies means controlling them.
But to expand on that, an interpreter could do some basic checking like:
* Does the command reference a path that the build system doesn't know about?
* Does the build system know where the executable is for the command, and is it well-known?
Things like that.
It won't be perfect, but it would be better. And it can get better with time.
> With regards to chroot, I stand corrected. I knew it was a tree of symlinks, but I thought it was also more than that because symlinks alone don't seem like a sandbox. Honestly, Cosmopolitan's system appears to be more of a sandbox than that.
To be totally clear: the tree of symlinks thing is a fallback, used only when lacking platform support or when sandboxing is explicitly turned off [0]. On Linux, the normal sandboxing strategy is to use namespaces, like most container runtimes. On Mac it apparently uses sandbox-exec (some opaque Apple tool), as was mentioned above. Chroot, being both non-POSIX, requiring root access on many systems, and not providing the necessary facilities is not really a great fit -- which I assume is why it's not used.
There was experimental Windows sandbox support at one point [1] based on how MS does it for BuildXL (their own build tool for giant monorepos) [2]. Unfortunately it doesn't seem to be maintained, and under the hood it's kinda ugly -- it actively rewrites code in-memory to intercept calls to the Win32 APIs [3], which was apparently the cleanest/best way MS could come up with. However, from Bazel's POV it works in a roughly similar way -- you spawn subprocesses under a supervisor, which is in charge of spinning up whatever the target process is with restrictions on time/memory usage/file access.
On the "sandbox in the interpreter" thing: what kind of checks are you envisioning? It seems like putting checks at that level would end up leaving a lot out -- the goal of any build system is to eventually spawn an arbitrary process (Python, gcc, javac, some shell script, etc.) and so even with extensive checks in starlark you'd end up with accidental sandbox breaks all over the place. For pure starlark rules you could e.g. check that there are no inputs from /usr, but even then if gcc does it implicitly, you're SOL. Or am I thinking of the wrong kind of checks?
EDIT: somehow missed your sibling comment. Nix is definitely cool, and is pretty similar to how Bazel does things with regards to explicit build graphs. The check for "well-known commands" would also be cool IMO. That said, Nix also has a chroot-y sandbox-y thing it uses to spawn processes -- so they're not all that different [4].
Yes, I agree that Bazel and Nix are not much different. Nix seems to be even more sandbox-like than Bazel, and that's good in my opinion.
Beyond what they do, I'd like checks that are even more invasive, more cautious about letting the build script do anything.
For example, if you're on Linux, a bad actor build script could technically mount the root directory `/` underneath the sandbox area in /<sandbox>/rootdir/` using Linux's bind mounts feature and then `rm -rf /<sandbox>/rootdir/`. Whatever it has permission to delete will be deleted (unless I'm unaware of some safety feature in bind mounts that prevents this besides needing root).
I would like checks that restrict a build to just performing those actions necessary to the build. You could, for example, have a permission policy, say for a particular package that you don't trust, that only allows that package to spawn GCC and the linker. If that package goes rogue in its build script, it would be stopped dead the first time it tried to either use `rm` or use a bind mount.
That's the sort of checks I'm referring to: checks for fine-grained permissions on what a build can do.
My idea is to take that even further and make it possible to have those checks in software that you compile from source so that you can stop the software from going rogue too. How I am going to do that, I'll leave unsaid for now, but I'm working on it.
> a bad actor build script could technically mount the root directory `/` underneath the sandbox area in /<sandbox>/rootdir/` using Linux's bind mounts feature
How could you do this without already being outside of the sandbox?
With regards to chroot, I stand corrected. I knew it was a tree of symlinks, but I thought it was also more than that because symlinks alone don't seem like a sandbox. Honestly, Cosmopolitan's system appears to be more of a sandbox than that.
I agree that a strong sandbox is going to require OS-specific support as of right now. I do have ideas for implementing sandboxes without it, but it requires putting the sandbox into the interpreter. And I could be completely wrong that the interpreter would do a good enough job.
And this is what I mean by interpreter: Bazel has a language. I think it's called Starlark. To make that language useful, it needs some interpreter. That intepreter might just be reading the language and building a dependency tree, but my point is that it could do more, including checks.
Perhaps my assertion that Bazel would not want to do it that way is not fair, but I said that because Bazel's method of sandboxing is different, and I suspect that they would not want to refactor their Starlark interpreter. That's all. They certainly could, and I would encourage them to. So I could be wrong, and I would eat my words in that case.