More

zingar · 2026-04-29T22:53:15 1777503195

How do you have the “modify LLM state from within” working? I can have it modify my config but I don’t know how to get it to eval and improve arbitrary elisp.

zingar · 2026-04-29T22:49:30 1777502970

Absolutely baffled too. I was expecting that they preferred the vim philosophy of small tools that do one thing well, but no. So you like modal editing, well you’ve got it right there in emacs. Why that of all the potential gripes you might have with emacs?

zingar · 2026-04-29T22:43:42 1777502622

I’d like a concrete example on how you’re actually controlling emacs with LLMs. Is ECA the part that does that?

zingar · 2026-04-25T21:24:36 1777152276

I think there’s a weaker claim that holds true: we were able to ignore lots of content based on the superficial (and pay proper attention to work that passed this test) and now we are overwhelmed because everything meets the superficial criteria and we can’t pay proper attention to all of it.

thehappyfellow · 2026-04-25T21:27:59 1777152479

That's what I had in mind! The whole post is a claim that evaluating knowledge work got more expensive because cheaper measures stopped correlating well with quality.

If someone was already evaluating the work output using a metric closer to the underlying quality then it might not have been a big shift for them (other than having much more work to evaluate).

Izkata · 2026-04-26T17:21:28 1777224088

You may have benefited from using the term we already had for the cheaper measures of negative code quality: code smells.

rowanG077 · 2026-04-25T21:54:59 1777154099

Yes, I agree that this is true!

You could however only do that if you were fine with unfairly judging the quality of work, as you now readily discarded quality work based on superficial proxies. Which admittedly is done in a lot of cases.

zingar · 2026-04-22T22:03:00 1776895380

Chess players learned to exploit chess computers’ weaknesses in the beginning too, but they can’t any longer. This version of the robot might not learn continuously, but the next will be better.

cool_dude85 · 2026-04-23T00:55:32 1776905732

I believe there are still some echoes of the concept. Even top engines will play certain grandmaster draw lines unless told more or less explicitly not to. So if you were playing a match against Stockfish you'd want to play the Berlin draw as White every time, for example.

NickC25 · 2026-04-23T14:28:08 1776954488

But chess is a turn-based game where there's no deception (in the sense that both players can see all legal moves for both themselves and their opposition at all times), whereas in table tennis, it's in real time, it's fast as hell, the table is small, and the ball can have 2 or 3 different spin types from the same arm/hand/wrist movement , and can land in a number of different spots.

zingar · 2026-04-22T20:07:11 1776888431

We have confidence in the extra code a compiler generates because it’s deterministic. We don’t have that in LLMs, neither those that wrote nor read the code.

zingar · 2026-04-21T08:30:05 1776760205

Interesting that what you're talking about as ASI is "as capable of handling explicit requirements as a human, but faster". Which _is_ better than a human, so fair play, but it's striking that this requirement is less about creativity than we would have thought.

zingar · 2026-04-21T08:27:40 1776760060

The work where I've done well in my life (smashing deadlines, rescuing projects) has so often come because I've been willing to push back on - even explicitly stated - requirements. When clients have tried to replace me with a cheaper alternative (and failed) the main difference I notice is that the cheaper person is used to being told exactly what to do.

Maybe this is more anthropomorphising but I think this pushing back is exactly the result that the LLMs are giving; but we're expecting a bit too much of them in terms of follow-up like: "ok I double checked and I really am being paid to do things the hard way".

Arch485 · 2026-04-21T18:14:36 1776795276

I think there's a difference between

"Hey boss, this isn't practical with the requirements you've given. We need to revise them to continue, here are my suggestions"

and

"Task completed! Btw, I ignored all of the constraints because I didn't like them."

Humans do the former quite often. When we do the latter, our employment tends not to last very long. I've only seen AIs choose the latter option.

nialse · 2026-04-21T10:29:39 1776767379

To be fair, there is likely not much training data on the difficult conversations you need to handle in a senior position, pushback being one of them. The trouble for the agents is that it is post hoc, to explain themselves, rationalising rather than ”help me understand” beforehand.

zingar · 2026-04-21T08:22:43 1776759763

Fascinating. This is invisible to me, what anthropomorphising did you notice that stood out?

philipwhiuk · 2026-04-21T14:57:41 1776783461

From the first sentence

> I asked an AI agent to solve a programming problem

You're not asking it to solve anything. You provide a prompt and it does autocomplete. The only reason it doesn't run forever is that one of the generated tokens is interpreted as 'done'.

xeyownt · 2026-04-21T15:12:19 1776784339

What a poor explanation.

With the same reasoning, human being are only a bunch of atoms, and the only reason they don't collide with other humans is because of the atomic force.

When your abstraction level is too low, it doesn't explain anything, because the system that is built on it is way too complex.

chrisjj · 2026-04-21T15:18:41 1776784721

"Autocomplete" is noy an abstraction level. It is the actual programmed behaviour.

Jtarii · 2026-04-21T19:00:35 1776798035

You can't understand human behaviour by reading a physics textbook.

chrisjj · 2026-04-21T19:08:50 1776798530

Of course not. One of the major differences between intelligent and word-guessing autocomplete.

Jtarii · 2026-04-22T00:21:01 1776817261

Do you think you could explain all AI behaviour by reading a physics textbook?

chrisjj · 2026-04-22T01:08:21 1776820101

Nope. But someone smart enough could.

zingar · 2026-04-22T08:55:14 1776848114

How the neural networks produce such surprisingly human characteristics is an open question with a ton of research going into it. Explaining this is a bit more than what one smart person can achieve.

rcxdude · 2026-04-21T16:19:33 1776788373

At a certain level of abstraction, yes.

SpicyLemonZest · 2026-04-21T15:34:02 1776785642

I just don't think that's correct. When I ask Claude to solve something for me, it takes a number of actions on my computer which are neither writing text nor interpreting the done token. It executes the build, debugs tests, et cetera. Sometimes it spawns mini-mes when it thinks that would be helpful! I think saying this is all "autocomplete" is a category error, like saying that you shouldn't talk about clicking buttons or running programs because it's all just electrically charged silicon under the hood.

happygoose · 2026-04-21T17:15:35 1776791735

technically, it does all that by outputting text, like `run_shell_command("cargo build")` as part of its response. But you could easily say similar things about humans.

To me, "autocomplete" seems like it describes the purpose of a system more than how it functions, and these agents clearly aren't designed to autocomplete text to make typing on a phone keyboard a bit faster.

I feel like people compare it to "autocomplete" because autocomplete seems like a trivial, small, mundane thing, and they're trying to make the LLMs feel less impressive. It's a rhetorical trick that is very overused at this point.

zzzeek · 2026-04-21T17:06:16 1776791176

yup, or "I played a first person shooter and shot lots of bad guys"

wrong! pushed buttons on your playstation in response to graphical simulations, duh

Jtarii · 2026-04-21T19:09:35 1776798575

When someone asks you a question in what ways are you not an "autocomplete"?

You aren't aware of how you come up with the words you are saying, you just start talking and the next word somehow falls out of your mouth. Maybe you think before you start talking, but where do the thoughts come from? They just appear to you in your head. We are just as much a predictive machine as LLMs, the human brain is just fuzzier.

ButlerianJihad · 2026-04-22T01:02:34 1776819754

Human minds have the ability to reason and to evaluate sources by different authorities. It is why some children are able to obey their parents while ignoring scammers on TV commercials, shouting at them to buy stuff.

We are also able to apply lived experience to our reasoning. That is why we can accurately answer a question about whether to drive or walk to the car wash. Or how we could immediately see how many "r"s are in "strawberry".

LLMs, being "glorified autocomplete" don't have a real way to separate truth from lies, or critically evaluate sources of information. Humans can absorb information in various ways, such as our "classic five senses" which inform our daily lives and motions, or by absorbing information via reading, hearing, seeing, etc., or by inferring and reasoning and being "guided by the Spirit" in a more metaphysical way where LLMs would fail.

gloosx · 2026-04-22T12:55:20 1776862520

Thoughts are derivative of sensory processing. We have subjective experience and subjective feeling, our symbols are grounded in physical reality. LLM "thoughts" are simulacrum, manipulating symbols according to rules does not imply understanding. One must be quite derealised to think we are predictive machinery or the human brain is just a fuzzier – it is much more than that.

slopinthebag · 2026-04-21T19:42:05 1776800525

Well, maybe this is how you think but not everyone is a self-admitted NPC. Speak for yourself only please.

Jtarii · 2026-04-21T22:10:25 1776809425

Think of a movie.

You had literally -zero- input in what your brain gave you as an answer. It just gave you something, you can make up whatever story you want to tell yourself, "it's my favourite movie", "I saw it last week", whatever you want. It doesn't change the fact that the words on your screen triggered some neural pathway in your brain that is totally out of your control and landed on "Titanic".

slopinthebag · 2026-04-22T00:58:11 1776819491

> You had literally -zero- input in what your brain does

:)

DangitBobby · 2026-04-21T21:16:29 1776806189

It's how literally everyone thinks. Your thoughts come unbidden via a process you do not understand and cannot observe and your consciousness follows them along. Your brain is not as special as you imagine.

slopinthebag · 2026-04-22T00:57:41 1776819461

Actually that this happens through our subconscious is incredibly special. Our brains are a marvel.

DangitBobby · 2026-04-22T01:04:47 1776819887

It's like we have little thinking sub-agents auto-completing cognition tokens in the background that then surface findings to the main agent which then auto-completes some more cognition tokens in the foreground.

slopinthebag · 2026-04-22T01:18:55 1776820735

Hah thats cute actually

paradox460 · 2026-04-22T04:07:26 1776830846

And if you suppress the stop word, things get funky really fast. Like a Joyce novel

zzzeek · 2026-04-21T17:05:17 1776791117

Ceci n'est pas une pipe

zingar · 2026-04-21T08:18:46 1776759526

> Maybe we should just commit the signature change with a TODO

I'm fascinated that so many folks report this, I've literally never seen it in daily CC use. I can only guess that my habitually starting a new session and getting it to plan-document before action ("make a file listing all call sites"; "look at refactoring.md and implement") makes it clear when it's time for exploration vs when it's time for action (i.e. when exploring and not acting would be failing).

dabbledash · 2026-04-22T03:12:44 1776827564

I wonder if it has to do with how often TODOs appear in the existing code.

zingar · 2026-04-22T08:56:16 1776848176

What's your hypothesis about the relationship between TODOs and action?

dabbledash · 2026-04-22T13:22:04 1776864124

I have only seen "go do X" result in CC adding "TODO: X" to the working file on one occasion. When it happened, I noticed that the file contained a very similar todo for a similar action already. My guess is that because the agent had the whole file in context, that influenced it to produce output similar to what was already there.