Man that is such a bummer. The Naval Support Activity (NSA) "base" is not a hardened military facility. I've never been to the one in Bahrain, but it's usually where you go to play ultimate frisbee, maybe some paintball if you are lucky, and other types of R&R. Usually have a Naval Exchange (NEX) which is like a really discounted 7-11 / gift shop / walmart (depending on where you are).
Schools getting blown up is also a bummer. Everything about this situation and maybe the world is a bummer.
As soon as we stop treating these as bummers, there is literally nothing stopping a cycle of destruction. There may not be anyways, I don't know but giving up on empathy entirely seems even more dangerous than being bad at it.
I have plenty of the sympathy for the victims but none FIR the aggressors in this illegal war.
You seem to be suggesting that not feeling sorry for the soldiers who got to evacuate without all their belongings somehow means I'm losing my humanity. That's a dangerous thing - lives of the innocent civilians who didn't chose to be bombed are more important. Aggressors could simply.... Leave and stop being in danger.
Similarly I have little pity for Russian soldiers losing lives in another illegal war of aggression, knowing how many war crimes they committed in their wake.
Great writeup. Only thing I din't see in here was an analysis of the impact of players like Talaas[1] and their stupid faster hardware LLMs.
I feel like it could be majorly disruptive, but idk if it's going to prolong the apocalypse or bring it about sooner -- or if it's a big nothing burger.
I'm bullist for something like talaas to get smaller and easy to put in a desktop. Imagine an RPG where NPCs.... are way more complex and the entire game is very non deterministic.
I think I would like that as well. The problem is that if we bake an LLM into HW and make it cheaper and very efficient to run, then all games will have the same AI slop content, which could get boring pretty fast. The alternative is that these cards should load a different / fine-tuned LLM per game, but then we already have GPUs for that and today's LLMs are nowhere near good enough at the size which a GPU can run.
> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.
> Post-training methods allow teams to refine model behavior for specific tasks and environments.
How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?
There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.
Pre-training mean exposing an already-trained model to more raw text like PDF extracts etc (aka continued pre-training). You wouldn't be starting from scratch, but it's still pre-training because the objective is just next token prediction of the text you expose it to.
Post-training means everything else: SFT, DPO, RL, etc. Anything that involves things like prompt/response pairs, reward models, or benefits from human feedback of any kind.
Er, then what is the "already trained" model? I thought pre-training was the gradient descent through the internet part of building foundational models.
Yeah, this checks out. I wonder what they are doing to prevent semantic collapse. Also, I wonder if the base model would already be instruct and RLHF tuned or only pre-trained. Trying to do additional training without semantic collapse in a way that is meaningful would be interesting to understand. Presumably they are using adapters but I've never had much luck in stacking adapters.
i.e.:
1. Do I start with an RLHF tuned model, "pretrain" on top of that (with adapter or by freezing weights?), then SFT on top of that (stack another adapter, or add layer(s) and freeze weights?) (and where did I get the dataset? synthetic extraction from corpus?), then RL (adapter, add layer(s) and freeze?)
I can imagine that, as usual, you start with a few examples and then instruct an LLM to synthesize more examples out of that, and train using that. Sounds horrible, but actually works fairly well in practice.
> Mr Cannon-Brookes told investors he “couldn’t be more bullish” about the opportunities ahead, despite relentlessly selling his own shares in the company daily. The Nightly reports he kept selling 7665 shares on a daily basis even in the month prior to the results at prices ranging from $US161.11 (AU$227) a share on January 8 to $US105.14 on February 4.
> While ordinary Aussies are asked to make big changes, the 46-year-old decided to treat himself to a ritzy new private jet late last year, admitting to a “deep internal conflict” over the carbon-heavy method of travel.
> The Atlassian co-founder and CEO bought a Bombardier 7500 and will use it to travel across his vast business operations, which include a minority stake in the Utah Jazz NBA team and a sponsorship deal with Formula 1.
There's a great 1986 book "Designing and Programming Personal Expert Systems" by Feucht and Townsend that implements expert systems in Forth (and in the process, much of the capability of Prolog and Lisp).
Ha,you beat me to it! That book was my first thought when I saw this post. I have a copy sitting here on my bookshelf.
Just to expand on how bonkers this book is... they assume that everyone has easy access to a Forth implementation. So they teach you how to build a Lisp on top of it. Then they use the Lisp you just built to build a Prolog. Then, finally, they do what the topic of the book actually is: build a simple expert system on top of that Prolog.
To be fair, in the 1980s thanks to the Forth Interest Group (FIG), free implementations of Forth existed for most platforms at a time when most programming languages were commercial products selling for $100 or more (in 1980s dollars). It's still pretty weird, but more understandable with that in mind.
Constantly amused by the split in comments of any moderately innovative language post between ‘I don't care about all this explanation, just show me the syntax!’ and ‘I don't understand any of this syntax, what a useless language!’
If the language is ‘JavaScript but with square brackets instead of braces’ maybe the syntax is relevant. But in general concrete syntax is the least interesting (not least important, but easiest to change) thing in a programming language, and its similarity to other languages a particular reader knows less interesting still. JavaScript is not the ultimate in programming language syntax (I hope!) so it's still worth experimenting, even if the results aren't immediately comprehensible without learning.
In Prolog the syntax is incredibly important. It is designed to be metainterpreted with the same ease in which a for-loop might be written in another language.
This can be arbitrarily extended in very interesting, beautiful, and powerful ways. This is extraordinarily hard to achieve and did not happen by accident.
As a challenge, see how easy it is to write a metainterpreter in another language of your choice. Alternately, see if you can think of any way the metainterpretation system in Prolog could be improved.
Finally, think of what would happen to this if we changed the syntax and introduced something like object.field notation.
So while logical programming can be achieved with other syntaxes, the metaintrepretive aspect will be lost. I have yet to see a language that does this better.
Nice link, thank you! I'm not sure it's super related to my comment but it is closely related to some other things I'm thinking about. I'll give it a read :)
So I have been doing formal specification with TLA+ using AI assistance and it has been very helpful AFTER I REALIZED that quite often it was proving things that were either trivial or irrelevant to the problem at hand (and not the problem itself), but difficult to detect at a high level.
I realize formal verification with lean is a slightly different game but if anyone here has any insight, I tend to be extremely nervous about a confidently presented AI "proof" because I am sure that the proof is proving whatever it is proving, but it's still very hard for me to be confident that it is proving what I need it to prove.
Before the dog piling starts, I'm talking specifically about distributed systems scenarios where it is just not possible for a human to think through all the combinatorics of the liveness and safety properties without proof assistance.
I'm open to being wrong on this, but I think the skill of writing a proof and understanding the proof is different than being sure it actually proves for all the guarantees you have in mind.
I feel like closing this gap is make it or break it for using AI augmented proof assistance.
In my experience, finding the "correct" specification for a problem is usually very difficult for realistic systems. Generally it's unlikely that you'll be able to specify ALL the relevant properties formally. I think there's probably some facet of Kolmogorov complexity there; some properties probably cannot be significantly "compressed" in a way where the specification is significantly shorter and clearer than the solution.
But it's still usually possible to distill a few crucial properties that can be specified in an "obviously correct" manner. It takes A LOT of work (sometimes I'd be stuck for a couple of weeks trying to formalize a property). But in my experience the trade off can be worth it. One obvious benefit is that bugs can be pricey, depending on the system. But another benefit is that, even without formal verification, having a few clear properties can make it much easier to write a correct system, but crucially also make it easier to maintain the system as time goes by.
I'm curious since I'm not a mathematician: What do you mean by "stuck for a couple of weeks"? I am trying to practice more advanced math and have stumbled over lean and such but I can't imagine you just sit around for weeks to ponder over a problem, right? What do you do all this time?
I'm not a mathematician either ;) Yeah, I won't sit around and ponder at a property definition for weeks. But I will maybe spend a day on it, not get anywhere, and then spend an hour or two a day thinking about ways to formulate it. Sometimes I try something, then an hour later figure out it won't work, but sometimes I really do just stare at the ceiling with no idea how to proceed. Helps if you have someone to talk to about it!
Experience counter examples for why a specific definition is not going to work.
Many times, at various levels of "not going to", usually hovering slightly above a syntactic level, but sometimes hovering on average above a plain definition semantic level, i.e. being mostly concerned with some indirect interaction aspects.
Yeah, even for simple things, it's surprisingly hard to write a correct spec. Or more to the point, it's surprisingly easy to write an incorrect spec and think it's correct, even under scrutiny, and so it turns out that you've proved the wrong thing.
This isn't to say it's useless; sometimes it helps you think about the problem more concretely and document it using known standards. But I'm not super bullish on "proofs" being the thing that keeps AI in line. First, like I said, they're easy to specify incorrectly, and second, they become incredibly hard to prove beyond a certain level of complexity. But I'll be interested to watch the space evolve.
(Note I'm bullish on AI+Lean for math. It's just the "provably safe AI" or "provably correct PRs" that I'm more skeptical of).
>But I'm not super bullish on "proofs" being the thing that keeps AI in line.
But do we have anything that works better than some form of formal specification?
We have to tell the AI what to do and we have to check whether it has done that. The only way to achieve that is for a person who knows the full context of the business problem and feels a social/legal/moral obligation not to cheat to write a formal spec.
Code review, tests, a planning step to make sure it's approaching things the right way, enough experience to understand the right size problems to give it, metrics that can detect potential problems, etc. Same as with a junior engineer.
If you want something fully automated, then I think more investment in automating and improving these capabilities is the way to go. If you want something fully automated and 100% provably bug free, I just don't think that's ever going to be a reality.
Formal specs are cryptic beyond even a small level of complexity, so it's hard to tell if you're even proving the right thing. And proving that an implementation meets those specs blows up even faster, to the point that a lot of stuff ends up being formally unprovable. It's also extremely fragile: one line code change or a small refactor or optimization can completely invalidate hundreds of proofs. AI doesn't change any of that.
So that's why I'm not really bullish on that approach. Maybe there will be some very specific cases where it becomes useful, but for general business logic, I don't see it having useful impact.
As a heavy user of formal methods, I think refinement types, instead of theorem proving with Lean or Isabelle, is both easier and more amenable to automation that doesn't get into these pitfalls.
It's less powerful, but easier to break down and align with code. Dafny and F* are two good showcases. Less power makes it also faster to verify and iterate on.
Completely agree. Refinement types is a much more practical tool for software developers focusing on writing real world correct code.
Using LEAN or Coq requires you to basically convert your code to LEAN/Coq before you can start proving anything. And importing some complicated Hoare logic library. While proving things correct in Dafny (for example) feels much more like programming.
You have identified the crux of the problem, just like mathematics writing down the “right” theorem is often half or more of the difficulty.
In the case of digital systems it can be much worse because we often have to include many assumptions to accommodate the complexity of our models. To use an example from your context, usually one is required to assume some kind of fairness to get anything to go through with systems operating concurrently but many kinds of fairness are not realistic (eg strong fairness).
I was having the same intuition, but you verbalised it better: the notion of having a definitive yes/no answer is very attractive, but describing what you need in such terms using natural language, which is inherently ambiguous... that feels like a fool's errand. That's why I keep thinking that LLM usage for serious things will break down once we get to the truly complicated things: it's non-deterministic nature will be an unbreakable barrier. I wish I'm wrong, though.
You want frontier models to actively prevent people from using them to do vulnerability research because you're worried bad people will do vulnerability research?
Not at all. I was suggesting if an account is performing source code level request scanning of "numerous" codebases - that it could be an account of interest. A sign of mis-use.
This is different than someones "npm audit" suggesting issues with packages in a build and updating to new revisions. Also different than iterating deeply on source code for a project (eg: nginx web server).
What's incredibly ironic is that research labs are releasing the most advanced hacking toolkit ever known, and cybersecurity defence stocks are going down as a result somehow. There’s no logic in the stock markets.
tl;dr - All this AI stuff is just Universal Paperclips[1]
I see a lot of comments about folks being worried about going soft, getting brain rot, or losing the fun part of coding.
As far as I'm concerned this is a bigger (albeit kinda flakey) self-driving tractor. Yeah I'd be bored if I just stuck to my one little cabbage patch I'd been tilling by hand. But my new cabbage patch is now a megafarm. Subjectively, same level of effort.
reply