yeah i remember learning this as a kid and being surprised. i originally thought laserdiscs were modern high tech, but then they turned out to actually be from the late 70s/early 80s with the primitive analog video encoding where red book audio cds of the mid to late 80s were actually digital.
BUT... Pioneer put AC-3 (Dolby Digital) surround on LaserDiscs before DVDs came out. So LaserDiscs were the first video medium to offer digital sound at home.
And at that point, most players sold were combo players that could also play CDs.
And there was one more disc format: CD Video. It was a CD-sized digital single that also had a LaserDisc section for the (analog) music video. I have a couple; one is Bon Jovi.
no, apparently there was both. i was familiar with video cd which was mpeg-1 on a cd-rom (with some weird partitioning scheme). cd video is apparently a very obscure hybrid format with an analog video section and a digital audio section. https://en.wikipedia.org/wiki/CD_Video
mmm. interesting and fun concept, but it seems to me like the text is actually the right layer for storing and expressing changes since that is what gets read, changed and reasoned about. why does it make more sense to use asts here?
are these asts fully normalized or do (x) and ((x)) produce different trees, yet still express the same thing?
why change what is being stored and tracked when the language aware metadata for each change can be generated after the fact (or alongside the changes)? (adding transform layers between what appears and what gets stored/tracked seems like it could get confusing?)
For one, it eliminates a class of merge conflict that arises strictly from text formatting.
I always liked the idea of storing code in abstraction, especially editors supported edit-time formatting. I enjoy working on other people's code, but I don't think anybody likes the tedium of complying with style guides, especially ones that are enforced at the SCM level, which adds friction to creating even local, temporary revisions. This kind of thing would obviate that. That's why I also appreciate strict and deterministic systems like rustfmt. Unison goes a little further, which is neat but I think they're struggling getting adoption because of that, even though I'm pretty sure they've got some better tooling for working outside the whole ecosystem. These decoupled tools are probably a good way to go.
I was messing around with a file-less paradigm that would present a source tree in arbitrary ways, like just showing a individual functions, so you have the things you're working on co-located rather than switching between files. Kind of like the old VB IDE.
Yeah I suppose that's true, too. You've got to do the conversion at some point. I don't know that you get any benefit of doing storing the text, doing the transformation to support whatever ops (deconflicting, etc.) and then transforming back to text again vs just storing it in the intermediate format. Ideally, this would all be transparent to the user anyway.
For one merge, yes. The fun starts when you have a sequence of merges.
CRDTs put ids on tokens, so things are a bit more deterministic.
Imagine a variable rename or a whitespace change; it messes text diffing completely.
I remember someone mentioning a system that operated with ASTs like this in the 70s or 80s. One of the affordances is that the source base did not require a linter. Everyone reading the code can have it formatted the way they like, and it would all still work with other people’s code.
Related, I’d love an editor that’d let me view/edit identifier names in snake_case and save them as camelCase on disk. If anyone knows of such a thing - please let me know!
Sure. Presumably you could have localized source presentation, too.
But, yeah, I think a personalized development environment with all of your preferences preserved and that don't interfere with whatever the upstream standard is would be a nice upgrade.
100% agree. I think AST-driven tooling is very valuable (most big companies have internal tools akin to each operation Beagle provides, and Linux have Coccinelle / Spatch for example), but it's still just easier implemented as a layer on top of source code than the fundamental source of truth.
There are some clever things that can be done with merge/split using CRDTs as the stored transformation, but they're hard to reason about compared to just semantic merge tools, and don't outweigh the cognitive overhead IMO.
Having worked for many years with programming systems which were natively expressed as trees - often just operation trees and object graphs, discarding the notion of syntax completely, this layer is incredibly difficult for humans to reason about, especially when it comes to diffs, and usually at the end you end up having to build a system which can produce and act upon text-based diffs anyway.
I think there's some notion of these kinds of revision management tools being useful for an LLM, but again, at that point you might as well run them aside (just perform the source -> AST transformation at each commit) rather than use them as the core storage.
you can parse the text at any time pretty much for free and use anything you learn to be smarter about manipulating the text. you can literally replace the default diff program with one that parses the source files to do a better job today.
This is the fundamental idea behind git - to fully compute/derive diffs from snapshots (commits) and to only store snapshots. While brilliant in some ways - particularly the simplifications it allows in terms of implementation, I’ve always felt that dropping all information about how a new commit was derived from its parent(s) was wasteful. There have been a number of occasions where I wished that git recorded a rename/mv somehow - it’s particularly annoying when you squash some commits and suddenly it no longer recognizes that a file was renamed where previously it was able to determine this. Now your history is broken - “git blame” fails to provide useful information, etc. There are other ways of storing history and revisions which don’t have this issue - git isn’t the end of the line in terms of version control evolution.
CRDT's trick is metadata. Good old diff guesses the changes by solving the longest-common-subsequence problem. There is always some degree of confusion as changes accumulate. CRDTs can know the exact changes, or at least guess less.
One nice thing about serializing/transmitting AST changes is that it makes it much easier to to compose and transform change sets.
The text based diff method works fine if everyone is working off a head, but when you're trying to compose a release from a lot of branches it's usually a huge mess. Text based diffs also make maintaining forks harder.
Git is going to become a big bottleneck as agents get better.
what do you actually gain over enforced formatting?
first you should not be composing releases at the end from conflicting branches, you should be integrating branches and testing each one in sequence and then cutting releases. if there are changes to the base for a given branch, that means that branch has to be updated and re-tested. simple as that. storing changes as normalized trees rather than normalized text doesn't really buy you anything except for maybe slightly smarter automatic merge conflict resolution but even then it needs to be analyzed and tested.
Diffs are fragile, and while I agree with that process in a world where humans do all the work and you aren't cutting a dozen different releases, I think that's a world we're rapidly moving away from.
in that case you probably flag a bunch of prs for release and it linearizes their order and rebases and tests each one a step ahead of your review (responding to any changes you make as you go).
Having a VCS that stores changes as refactorings combined with an editor that reports the refactorings directly to the VCS, without plain text files as intermediate format, would avoid losing information on the way.
The downside is tight coupling between VCS and editor. It will be difficult to convince developers to use anything else than their favourite editor when they want to use your VCS.
I wonder if you can solve it the language-server way, so that each editor that supports refactoring through language-server would support the VCS.
it can actually look across conversations, i would make sure to tell it not to. (one fun thing to do is to ask it to look across the past year and generate a claude wrapped where it roasts you.)
i also probably wouldn't use it for anything i don't know how to verify myself.
if you have pre-execution enforcement, what's the point of the verification protocol? the ability to apply stricter covenants to past action logs after the fact? i'm not sure i follow the use-case for that.
The Enforcement and verification serve for a different audience.
Enforcement will protect you as it stops your agent from doing something it shouldn't. Verification protects everyone else, as it lets a third party independently confirm that the enforcement actually happened, without trusting you.
You say "my agent followed the rules," while the regulator says "prove it." The hash-chained logs and signed covenants are the proof. Without verification, it's just your word.
makes sense. the core modules that i looked at look pretty good. (action-log, verifier, composability, dsl and parser).
all the kitchen sink stuff makes it pretty intense though. have you considered separating out just the core execution, logging and verification components? stuff like c2pa seems super cool, but maybe a second layer for application type things like that so that the core consensus stuff can be inspected easily? one goal for a system like this is easy auditability of the system itself.
That is exactly the direction I'm heading based on feedback from this thread. The core primitives (action-log, verifier, covenant DSL, parser) as a small, auditable package. Everything else — c2pa, otel, langchain, compliance adapters as a separate layer that builds on top.
You are right that auditability of the system itself is the goal. Its very hard to trust a trust layer you can't easily inspect. Appreciate you digging deep into the code.
it's probably a good thing to have domestic advanced manufacturing if only to have real-world testbeds for development of advanced automation technology.
it's cool and all that boston dynamics can do what they do, but i wonder if one reason why the chinese robotics industry is so advanced is because they've been able to test in production on real production lines, experiment with dark factories and learn a ton in the process.
it's kind of funny when you think about it. both the west and east are facing down the same set of potential problems that come with real automation of industries that have served as true economic dynamos for decades.
> it's probably a good thing to have domestic advanced manufacturing if only to have real-world testbeds for development of advanced automation technology.
Yes, it's a good thing to have domestic advanced manufacturing, but this probably doesn't qualify.
According to the article, it's a site where they already assemble servers for Apple's own use, and will now start assembling Mac Minis as well. Electronics assembly is, for the most part, a pretty low-value part of the supply chain.
It's not nothing, but it pales in comparison to the scientific and technological sophistication and financial value of wafer fabs and IC test and packaging facilities. (I worked at Intel's flagship fabs in Oregon, and have worked as a consultant with other semi fabs around the world.)
maybe would be interesting to include a lecture on how to interact with the open source community and successfully contribute to an open source project while respecting maintainer time and energy (and other unwritten rules of (n)etiquette).
edit: already in the "beyond the code" section... cool!
orcad is the commercial classic for doing schematics with a spice backend. (spice is an oss engine out of berkeley for simulating circuits on computers. for dc it just solves the classic nodal analysis and for ac you can feed in things from a fantasy signal generator and capture things at various nodes in the circuit) there's also some pretty cool looking commercial web thing now that also will maintain netlists with real-time prices and will let you swap parts out and set minimum quantities etc.
kicad is the oss orcad, but i never got good at it. (to be fair, orcad was weird to learn as well)
I think altium has taken over as the top tier commercial offering in this space.
I always disliked Orcad. Especially because cadence had excellent software that predated OrCAD, and for reasons that I cannot fathom chose to promote OrCAD after they acquired it instead of the better software.
Here's a specific example in the interface. If you wanted to draw a wire, the keyboard shortcut of the old software was 'w' but orcad required you to type 'ctrl + w'. Why are you forcing me to use control when w doesn't do anything on its own? It was filled with similar tiny annoyances that just slowed things down. (Admittedly, it's been years since that was my primary work, and free stuff is good enough for what I do now.) I sincerely hope that orcad has continued to improve over the years.
Altium has taken over a lot of small to medium sized shops. Mostly because the price is right for its capability. It also has a history of being the least bad compromise between the odd mixtures of excellence and user-hostility Cadence and Mentor tend to come up with, going back to the Protel days, and they've done a good job in the last decade+ of marketing it to those shops. Cadence and Siemens nee Mentor (and maybe Zuken? I've never seen Zuken in the wild, but it always makes these lists) have been neglecting the entry level and smaller organizations and aggressively trying to move their customers to their higher tier offerings during that time. But while it's Altium's flagship product, it is not top tier. It is really entry-level for a professional PCB-level design package, like PADS and OrCAD as opposed to Xpedition and Allegro.
> To me that implies the input isn't deterministic, not the compiler itself
or the system upon which the compiler is built (as well as the compiler itself) has made some practical trade offs.
the source file contents are usually deterministic. the order in which they're read and combined and build-time metadata injections often are not (and can be quite difficult to make so).
I mean, if you turn off incremental compilation and build in a container (or some other "clean room" environment), it should turn out the same each time. Local builds are very non-deterministic, but CI/CD shouldn't be.
Either way it's a nitpick though, a compiler hypothetically can be deterministic, an LLM just isn't? I don't think that's even a criticism of LLMs, it's just that comparing the output of a compiler to the output of an LLM is a bad analogy.
> I mean, if you turn off incremental compilation and build in a container (or some other "clean room" environment), it should turn out the same each time. Local builds are very non-deterministic, but CI/CD shouldn't be.
lol, should. i believe you have to control the clock as well and even then non-determinism can still be introduced by scheduler noise. maybe it's better now, but it used to be very painful.
> Either way it's a nitpick though, a compiler hypothetically can be deterministic, an LLM just isn't? I don't think that's even a criticism of LLMs, it's just that comparing the output of a compiler to the output of an LLM is a bad analogy.
llm inference is literally sampling a distribution. the core distinction is real though, llms are stochastic general computation where traditional programming is deterministic in spirit. llm inference can hypothetically be deterministic as well if you use a fixed seed, although, like non-trivial software builds on modern operating systems, squeezing out all the entropy is a non-trivial affair. (some research labs are focused on just that, deterministic llm inference.)
reply