Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can somebody explain the whole thing to me? I know how to programm on windows and linux, but thats basically at the level of calling the compiler and linker, and some parameters to sometimes build a dll. So what does this do, and how does it work? Does it apply to DLLs as well? Also, 'run anywhere' means on the same processor type?


It’s a libc implementation accompanied by a linker script that tricks the linker into generating a polyglot executable file, simultaneously interpretable as an MZ executable, x86 boot sector and a Unix shell script, the latter of which re-writes the executable into ELF or Mach-O format and then executes it again. I haven’t analysed it all that thoroughly, but that’s the gist of it. Here’s more information: https://justine.storage.googleapis.com/ape.html

Polyglot files are not a particularly new invention, but devising a reproducible process to generate those can be quite tricky, so most don’t bother unless there’s a special need for it. (For example, the GRUB4DOS fork of GRUB contains a ‘bootlace’ executable that is simultaneously executable as a DOS .COM and an ELF file.)

The libc itself contains a number of specially-crafted functions and header files that expose the functions’ clobbered register set as part of the functions’ public ABI, which allows the compiler to use that knowledge to better allocate registers and optimise more aggressively. The downside is that if the clobbered register set changes, it requires everything using the function to be recompiled.


How is the libc polyglot? I mean don't system call vary widely between windows and Linux?


    int ftruncate(int fd, int64_t length) {
      if (!IsWindows()) {
        return ftruncate$sysv(fd, length);
      } else {
        return ftruncate$nt(fd, length);
      }
    }
You get the idea. The OS is detected at startup and then checked each time a function is invoked.


Would this not make the implementation inefficient?


IsWindows is likely a macro and therefore you have if(0) or if(1) which the compiler can easily optimize away.


I thought this implementation was meant for compile-once-run-everywhere usage, so I can't see how a compiler would do away with a this if-statement. Could you please say more?


The compiler keeps the branch. The branch doesn't impact performance because it's fully predictable and therefore costs less than 1 nanosecond. Please note however that fadvise() is a system call and the SYSCALL instruction o/s context-switch generally costs 1 microsecond, which is 1000x slower than any overhead incurred by the portability benefits Cosmopolitan offers.

You can however tune the build so the `IsWindows()` branch is dead code eliminated anyway:

    make MODE=tiny CPPFLAGS=-DSUPPORT_VECTOR=0b11111011
Basically unsetting the Windows bit causes the Windows polyfills to not be linked in your binary. It's only useful for making binaries tinier. So rest assured, if you find your 16kb APE .com binary to be too bloated, you can always use SUPPORT_VECTOR to squeeze it down to the order of a 4kb ELF executable. See: https://github.com/jart/cosmopolitan/blob/1df136323be2ae806f...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: