Hacker Newsnew | past | comments | ask | show | jobs | submit | zingar's commentslogin

How do you have the “modify LLM state from within” working? I can have it modify my config but I don’t know how to get it to eval and improve arbitrary elisp.

Absolutely baffled too. I was expecting that they preferred the vim philosophy of small tools that do one thing well, but no. So you like modal editing, well you’ve got it right there in emacs. Why that of all the potential gripes you might have with emacs?

I’d like a concrete example on how you’re actually controlling emacs with LLMs. Is ECA the part that does that?

I think there’s a weaker claim that holds true: we were able to ignore lots of content based on the superficial (and pay proper attention to work that passed this test) and now we are overwhelmed because everything meets the superficial criteria and we can’t pay proper attention to all of it.

That's what I had in mind! The whole post is a claim that evaluating knowledge work got more expensive because cheaper measures stopped correlating well with quality.

If someone was already evaluating the work output using a metric closer to the underlying quality then it might not have been a big shift for them (other than having much more work to evaluate).


You may have benefited from using the term we already had for the cheaper measures of negative code quality: code smells.

Yes, I agree that this is true!

You could however only do that if you were fine with unfairly judging the quality of work, as you now readily discarded quality work based on superficial proxies. Which admittedly is done in a lot of cases.


Chess players learned to exploit chess computers’ weaknesses in the beginning too, but they can’t any longer. This version of the robot might not learn continuously, but the next will be better.

I believe there are still some echoes of the concept. Even top engines will play certain grandmaster draw lines unless told more or less explicitly not to. So if you were playing a match against Stockfish you'd want to play the Berlin draw as White every time, for example.

But chess is a turn-based game where there's no deception (in the sense that both players can see all legal moves for both themselves and their opposition at all times), whereas in table tennis, it's in real time, it's fast as hell, the table is small, and the ball can have 2 or 3 different spin types from the same arm/hand/wrist movement , and can land in a number of different spots.

We have confidence in the extra code a compiler generates because it’s deterministic. We don’t have that in LLMs, neither those that wrote nor read the code.

Interesting that what you're talking about as ASI is "as capable of handling explicit requirements as a human, but faster". Which _is_ better than a human, so fair play, but it's striking that this requirement is less about creativity than we would have thought.

The work where I've done well in my life (smashing deadlines, rescuing projects) has so often come because I've been willing to push back on - even explicitly stated - requirements. When clients have tried to replace me with a cheaper alternative (and failed) the main difference I notice is that the cheaper person is used to being told exactly what to do.

Maybe this is more anthropomorphising but I think this pushing back is exactly the result that the LLMs are giving; but we're expecting a bit too much of them in terms of follow-up like: "ok I double checked and I really am being paid to do things the hard way".


I think there's a difference between

"Hey boss, this isn't practical with the requirements you've given. We need to revise them to continue, here are my suggestions"

and

"Task completed! Btw, I ignored all of the constraints because I didn't like them."

Humans do the former quite often. When we do the latter, our employment tends not to last very long. I've only seen AIs choose the latter option.


To be fair, there is likely not much training data on the difficult conversations you need to handle in a senior position, pushback being one of them. The trouble for the agents is that it is post hoc, to explain themselves, rationalising rather than ”help me understand” beforehand.

Fascinating. This is invisible to me, what anthropomorphising did you notice that stood out?

From the first sentence

> I asked an AI agent to solve a programming problem

You're not asking it to solve anything. You provide a prompt and it does autocomplete. The only reason it doesn't run forever is that one of the generated tokens is interpreted as 'done'.


What a poor explanation.

With the same reasoning, human being are only a bunch of atoms, and the only reason they don't collide with other humans is because of the atomic force.

When your abstraction level is too low, it doesn't explain anything, because the system that is built on it is way too complex.


"Autocomplete" is noy an abstraction level. It is the actual programmed behaviour.

You can't understand human behaviour by reading a physics textbook.

Of course not. One of the major differences between intelligent and word-guessing autocomplete.

Do you think you could explain all AI behaviour by reading a physics textbook?

Nope. But someone smart enough could.

How the neural networks produce such surprisingly human characteristics is an open question with a ton of research going into it. Explaining this is a bit more than what one smart person can achieve.

At a certain level of abstraction, yes.

I just don't think that's correct. When I ask Claude to solve something for me, it takes a number of actions on my computer which are neither writing text nor interpreting the done token. It executes the build, debugs tests, et cetera. Sometimes it spawns mini-mes when it thinks that would be helpful! I think saying this is all "autocomplete" is a category error, like saying that you shouldn't talk about clicking buttons or running programs because it's all just electrically charged silicon under the hood.

technically, it does all that by outputting text, like `run_shell_command("cargo build")` as part of its response. But you could easily say similar things about humans.

To me, "autocomplete" seems like it describes the purpose of a system more than how it functions, and these agents clearly aren't designed to autocomplete text to make typing on a phone keyboard a bit faster.

I feel like people compare it to "autocomplete" because autocomplete seems like a trivial, small, mundane thing, and they're trying to make the LLMs feel less impressive. It's a rhetorical trick that is very overused at this point.


yup, or "I played a first person shooter and shot lots of bad guys"

wrong! pushed buttons on your playstation in response to graphical simulations, duh


When someone asks you a question in what ways are you not an "autocomplete"?

You aren't aware of how you come up with the words you are saying, you just start talking and the next word somehow falls out of your mouth. Maybe you think before you start talking, but where do the thoughts come from? They just appear to you in your head. We are just as much a predictive machine as LLMs, the human brain is just fuzzier.


Human minds have the ability to reason and to evaluate sources by different authorities. It is why some children are able to obey their parents while ignoring scammers on TV commercials, shouting at them to buy stuff.

We are also able to apply lived experience to our reasoning. That is why we can accurately answer a question about whether to drive or walk to the car wash. Or how we could immediately see how many "r"s are in "strawberry".

LLMs, being "glorified autocomplete" don't have a real way to separate truth from lies, or critically evaluate sources of information. Humans can absorb information in various ways, such as our "classic five senses" which inform our daily lives and motions, or by absorbing information via reading, hearing, seeing, etc., or by inferring and reasoning and being "guided by the Spirit" in a more metaphysical way where LLMs would fail.


Thoughts are derivative of sensory processing. We have subjective experience and subjective feeling, our symbols are grounded in physical reality. LLM "thoughts" are simulacrum, manipulating symbols according to rules does not imply understanding. One must be quite derealised to think we are predictive machinery or the human brain is just a fuzzier – it is much more than that.

Well, maybe this is how you think but not everyone is a self-admitted NPC. Speak for yourself only please.

Think of a movie.

You had literally -zero- input in what your brain gave you as an answer. It just gave you something, you can make up whatever story you want to tell yourself, "it's my favourite movie", "I saw it last week", whatever you want. It doesn't change the fact that the words on your screen triggered some neural pathway in your brain that is totally out of your control and landed on "Titanic".


> You had literally -zero- input in what your brain does

:)


It's how literally everyone thinks. Your thoughts come unbidden via a process you do not understand and cannot observe and your consciousness follows them along. Your brain is not as special as you imagine.

Actually that this happens through our subconscious is incredibly special. Our brains are a marvel.

It's like we have little thinking sub-agents auto-completing cognition tokens in the background that then surface findings to the main agent which then auto-completes some more cognition tokens in the foreground.

Hah thats cute actually

And if you suppress the stop word, things get funky really fast. Like a Joyce novel

Ceci n'est pas une pipe

> Maybe we should just commit the signature change with a TODO

I'm fascinated that so many folks report this, I've literally never seen it in daily CC use. I can only guess that my habitually starting a new session and getting it to plan-document before action ("make a file listing all call sites"; "look at refactoring.md and implement") makes it clear when it's time for exploration vs when it's time for action (i.e. when exploring and not acting would be failing).


I wonder if it has to do with how often TODOs appear in the existing code.

What's your hypothesis about the relationship between TODOs and action?

I have only seen "go do X" result in CC adding "TODO: X" to the working file on one occasion. When it happened, I noticed that the file contained a very similar todo for a similar action already. My guess is that because the agent had the whole file in context, that influenced it to produce output similar to what was already there.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: