$$$, one of the classic bad faith motives. Most of tech nowadays is subsidized by advertising and profiling to some degree, often quite a large degree.
Sooner or later, yes. What stops it , other than layers of imperfect process? And it's the perfect vector to exploit anyone who doesn't review and understand the generated code before running it locally
But unit and integration tests generally only catch the things you can think of. That leaves a lot of unexplored space in which things can go wrong.
Separately, but related - if you offload writing of the tests and writing of the code, how does anybody know what they have other than green tests and coverage numbers?
I have been seeing this problem building over the last year. LLM generated logic being tested by massive LLM generated tests.
Everyone just goes overboard with the tests since you can easily just tell the LLM to expand on the suite. So you end up with a massive test suite that looks very thorough and is less likely to be scrutinized.
I'm burning through pretty fast with context sizes of only 32-64kb. I regularly clear when I change topics.
A simple "how do I do x" question used 2% of my budget.
I paid extra and chewed through $5 in a few minutes of analyzing segments of log files.
At this rate it's not worth the trouble of carefully managing usage to avoid ambiguous limits that disrupt my work.
If that's the way it is in order for them to make money, that's fine - but I need a usable tool that I don't have to micromanage. This product is not worth it ($, time) to me at this rate.
I hope it changes because when it works it's a great addition to my tools.
I just fixed this bug in a summarizer. Reasoning tokens were consuming the budget I gave it (1k), so there was only a blank response. (Qwen3.5-35B-A3B)
Most inference engines would return the reasoning tokens though, wouldn't you see that the reasoning_content (or whatever your engine calls it) was filled while content wasn't?
A number of years ago I hacked together something conceptually similar [1]. It was for design and demonstration of CLI tooling (not TUI). It used its own DSL which included command definitions and demo output for invocations.
It created a replica CLI that behaved the way the real thing would, for quick prototyping and design feedback. The next step [2] would have been generating backend for different languages/libraries to create the actual CLI.
I lost the original sample "buddy" files but we did use it for prototyping a new Chef Workstation cli. Copies still haunt the Internet [3]
The removal of the quotes around "better" discards an entire layer of meaning.
It also loses the voice that was present in the 'before' version. Typos/misuses and all. More tangibly, an entire layer of meaning was dropped when it removed the quotes around 'better'.
I see your point, and I agree the result can feel impersonal and stiff. But, I'd say the overall improvement is more important than one possible deterioration. Quotes are easy to put back if I'd think it was important (it was not in this case)
Please reply in Swedish only. Remember to not use any tool to translate to avoid subtle layers of meaning being removed. It's easy! /Native speaker ;)
Why do they need a way to acknowledge that? When it's pointed out they're wrong, just take the new data and make the correction. They don't need human mannerisms.
reply