I feel they sit of on the opposite end of the OP here. One wants to write out specs to control the agent implementation to achieve a one shot execution. Other side says: let’s won’t waste time of humans writing anything.
I’m personally torn. A lot of the spec talk and now here in combination with TDD etc feels like the pipe dreams of the mid 2000. There was this idea of the Architect role who writes UML and specs. And a normal engineer just fills in the gaps. Then there was TDD. Nothing against it personally. But trying to write code in test first approach when you don’t really have a clue how a specific platform/system/library works had tons of overhead. Also the side effect of code written in the most convenient way to be tested and not to be executed. All in all to throw this ideas together for AI now…
But throwing tokens out of the window and hoping for the token lottery to generate the best PR is also not the right direction in my book. But somebody needs to investigate in both extremes I say.
Actually, nobody said the spec needs to be written by humans.
My personal opinion: with today's LLMs, the spec should be steered by a human because its quality is proportional to result quality. Human interaction is much cheaper at that stage — it's all natural language that makes sense. Later, reasoning about the code itself will be harder.
In general, any non-trivial, valuable output must be based on some verification loop. A spec is just one way to express verification (natural language — a bit fuzzy, but still counts). Others are typecheckers, tests, and linters (especially when linter rules relate to correctness, not just cosmetics).
Personally, on non-trivial tasks, I see very good results with iterative, interactive, verifiable loops:
- Start with a task
- Write spec in e.g. SPEC.md → "ask question" until answer is "ok"/proceed
- Write implementation PLAN.md — topologically sorted list of steps, possibly with substeps → ask question
- For each step: implement, write tests, verify (step isn't done until tests pass, typecheck passes, etc.); update SPEC/PLAN as needed → ask question
- When done, convert SPEC.md and PLAN.md into PR description (summary) and discard
("Ask question" means an interactive prompt that appears for the user. Each step is gated by this prompt — it holds off further progress, giving you a chance to review and modify the result in small bits you can actually reason about.)
The workflow: you accept all changes before confirming the next step. This way you get code deltas that make sense. You can review and understand them, and if something's wrong you can modify by hand (especially renames, which editors like VS Code handle nicely) or prompt for a change. The LLM is instructed to proceed only when the re-asked answer is "ok".
This works with systems like VSCode Copilot, not so much with CC cli.
I'm looking forward to an automated setup where the "human" is replaced by an "LLM judge" — I think you could already design a fairly efficient system like this, but for my work LLMs aren't quite there yet.
That said, there's an aspect that shouldn't be forgotten: this interactive approach keeps you in the driving seat and you know what's happening with the codebase, especially if you're running many of these loops per day. Fully automated solutions leave you outside the picture. You'll quickly get disconnected from what's going on — it'll feel more like a project run by another team where you kind of know what it does on the surface but have no idea how. IMO this is dangerous for long-term, sustainable development.
I for one might use these chats as an input for switching over to keep the learning process fast. For me it took a while for ChatGPT to get me. I know that other people delete memories because they want a clean slate experience with every chat. I use chatGPT mostly private (use claud code for work for instance) and I prefer that memories travel across chats.
I’m not a native speaker so my level of AI recognition is already low. I find it very interesting what patters people bring up to declare it’s AI. The 3 punchline one for instance is a pattern I use while speaking. Can’t say I would write like this though.
It's not so much the grouping of 3 or way it's supposed to be punchy specifically that's the problem, that is just one example of what gives the article the "LLM Generated" feeling since whatever cheap model people are using for this kind of spam has some common ticks.
I use groupings of 3 and try to make things punchy myself sometimes, especially when I'm writing something intended to sway others. I think the problem with this article is the way it feels like the perfect average of corporate writing. It's sort of like the "written by committee" feel that incredibly generic pop music often has.
When I write things, I often go back and edit and reword parts. Like the brushstrokes in an oil painting, the flow of thought varies between paragraphs and even sentences. LLMs only generate things from left to right (or vice versa in RTL languages, I presume). I think that gives LLM generated text a "smooth" texture that really stands out to anyone who reads a lot.
I completely agree with you. There's something conspicuous about this particular use of the "group of three" device. It's trying but it's goofy and conspicuous. I think it's not human, it's 52 trillion parameters in a trenchcoat.
I'm not a native speaker and my level of AI recognition is higher than 99.999% of native speakers - and I'd be happy to be tested on it for proof.
The biggest factor is simply how long you've been using LLMs to generate text, how often, how much. It's like how an experienced UI designer can instantly tell that something is off by a single pixel off upon first seeing a UI, whereas if you gave me $200 to find it within 10 minutes I might well fail.
Aside from particulars like the set of 3, LLMs add a lot of emotive language which doesn't mean anything or is a repetition of already established points. Since they can't add any actual substance beyond what was in the prompt, the only thing they do is pad the prompt with filler language.
On top you have news outlets and educated people not being clear what they are. See from the article:
He has long argued tariffs boost American manufacturing - but many in the business community, as well as Trump's political adversaries, say the costs are passed on to consumers
It’s reported as if someone still needs to figure out who pays the tariffs in the end. I’m aware that tariffs are a lever to potential move buying behavior and give incentives to move production locally. But in this instance and how it’s/ was implemented it’s clear who is the paying for it.
Totally agree. Sounds some envision want some level of Downton Abbey without the humans as service personal. A footman / maid in every room or corner to handle your requests at any given moment.
Also in the good old days if you sealed the wrong number you had some time to just hang up without harm done. Today the connection is made the moment you pressed the button or in this case when Siri decided to call.
Happened to me too while being in the car. With every message written by Siri it feels like you need to confirm 2 or 3 times (I think it is only once but again) but it calls happily people from your phone book.
I see this as well with huge modern buildings with wood parts. They look great the first year. The wood shines red’ish. After a winter the wood part starts to grey out. I understand that this is sometimes a look they strife for but all the preview renders show it in the prestige condition. Nobody is doing a yearly training.
And don’t get me started on all the glass survives for elevators, roofs, bus stops, divider panels next to tram stops (I’m mainly meaning Berlin here) which nobody cares to clean or is so difficult to clean that after 2 or 3 years it looks very run down.
The Statue of Liberty would be red without her patina and would look weird ;). I’m not talking about the beauty of weathering. I think a dirty glass roof which no longer lets any light through a planned weathering tactic. The point was that the plans architects make are always showing the building in prestine condition. And they never reflect how this building will look like in a few years. One example I see every day is a Train-station entrance. It has a very dramatic metal ark that stretches up. Looked great in the past. Now you see dirty water running down the surface. The brushed metal is stained with grime that pilled up. Every time it rains the grime runs a bit deeper. They tried to clean it a few month back. They have to come with a special crane and water jets to remove the grime. But nobody takes the time to polish the surface back up. Is this bad? No of course not. But don’t plan and sell something that will only last for half a year. That’s why I also think this post is brilliant.
I love NY. Not only for the art decor but als the human weathering ;) I meant that Lady Liberty would look weird because she is known to be green. I know that the early advertisements showed her red as well. Also when the torch was shown in NY to fundraise the pedestal.
Bringing kintsugi into this conversation is like saying “being underwater can be quite advantageous!” and linking a video on fish, when the main topic is about people drowning in the ocean.
Art is everywhere, and starts with a simple philosophy of making things slightly less awful everyday. Initially focused on your own mind, body, and soul... then recognizing you were always part of something a lot bigger and older than most imagine.
(this last video is a parody-ish but really great music unironically out of the original music being I am just a freak, both music are really great in my opinion unironically haha!)
Not necessarily. On a design that requires being new to look good, all weathering will be perceived as rot, never as patina.
The point is that some approaches to architectural beauty make it more or less impossible that any amount of weathering could ever be perceived as patina, while others look good both new and old.
Ok I ask chat GPT sometimes for advice in health / Fitness and also finance. Not like where to put my money but for general Information how stuff works what would apply here and there. The issue is already that OpenAI knows a lot of me. And ChatGPT itself when asked what he things I am etc draws a pretty clear picture. But I stay away from oversharing specific things. That is mainly my income and other super detailed data. When I ask I try to formulate it to use simple numbers and examples. Works for me. When working with coding agents I’m very skeptical to whitelist stuff. It takes quite the while before I allow a generic command to be executed outside of a sandbox. But to install a random skill to help with Finance Automation… can’t belief it. Under what stone do you have to live to trust your money be handed by an agent and then also in connection with a random skill?
You have "memory" activated in your settings. It is recording information about you and using it in future conversations. Have a look at settings > personalization
What does this matter? Even if I disable it I send enough of data. The point I tried to make was that it baffles me that others just trust theses tools. I’m aware that I send data to OpenAI. I know that chatGPT has a memory feature. But I’m not so naive to think that just because I disabled this magic checkbox the other side might not continue to collect and store data.
You are a god send! I have the same issue. My house (2013!!!) is fully phone wired but has zero Ethernet. I have 3 floors which each running on a different phase (the electrician wired it like that). I have a power line adapter in my fuse box to connect directly to the three phases. But I can’t stream content or large files. Even worse the power line adapters bring noise into my power sockets. A guitar amp gets ground crackling etc. will look into this solution!
I see that they also offer a coax solution. Which would fit just as well I have a tv connection in every room which also connects in the second box in the utility room. And I don’t need it ;)
I’m German and think the idea to compound words into one should not really count as the longest / a long word. I mean yes it is but also it isn’t. Like: “ Grundstücksverkehrsgenehmigungszuständigkeitsübertragungsverordnung” In the end it’s just slapping words together and count it as one.
I’m personally torn. A lot of the spec talk and now here in combination with TDD etc feels like the pipe dreams of the mid 2000. There was this idea of the Architect role who writes UML and specs. And a normal engineer just fills in the gaps. Then there was TDD. Nothing against it personally. But trying to write code in test first approach when you don’t really have a clue how a specific platform/system/library works had tons of overhead. Also the side effect of code written in the most convenient way to be tested and not to be executed. All in all to throw this ideas together for AI now… But throwing tokens out of the window and hoping for the token lottery to generate the best PR is also not the right direction in my book. But somebody needs to investigate in both extremes I say.
reply