Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There was a study recently that made it clear the use of LLMs for coding assistance made people feel more productive but actually made them less productive.

EDIT: Added links.

https://www.cio.com/article/3540579/devs-gaining-little-if-a...

https://web.archive.org/web/20241205204237/https://llmreport...

(Archive link because the llmreporter site seems to have an expired TLS certificate at the moment.)

No improvement to PR throughput or merge time, 41% more bugs, worse work-life balance...



I recently slapped 3 different 3 page sql statements and their obscure errors with no line or context references from Redshift into Claude, it was 3 for 3 on telling me where in my query I was messing up. Saved me probably 5 minutes each time but really saved me from moving to a different task and coming back. So around $100 in value right there. I was impressed by it. I wish the query UI I was using just auto-ran it when I got an error. I should code that up as an extension.


$100 to save 15 minutes implies that you net at least $800,000 a year. Well done if so!


When forecasting for developers and employee cost for a company I double their pay but I'm not going to say what I make and if I did or not. I also like to think that developers should be working on work that is many multiples of leverage over their pay to be effective. But thanks.


> but really saved me from moving to a different task and coming back

You missed this part. Being able to quickly fix things without deep thought while in flow saves you from the slowdowns of context switching.


That $100 of value likely costed them more like $0.1 - $1 in API costs.


It didn't cost me anything, my employer paid for it. Math for my employer is odd because our use of LLMs is also R&D (you can look at my profile to see why). But it was definitely worth $1 in api costs. I can see justifying spending $200/month for devs actively using a tool like this.


I am in a similar same boat. Its way more correct than not for the tasks I give it. For simple queries about, say, CLI tools I dont use that often, or regex formulations, I find it handy as when it gives the answer Its easy to test if its right or not. If it gets it wrong, I work with Claude to get to the right answer.


First of all, that's moving the goalposts to next state over, relative to what I replied to.

Secondly, the "No improvement to PR throughput or merge time, 41% more bugs, worse work-life balance" result you quote came, per article, from a "study from Uplevel", which seems to[0] have been testing for change "among developers utilizing Copilot". That may or may not be surprising, but again it's hardly relevant to discussion about SOTA LLMs - it's like evaluating performance of an excavator by giving 1:10 toy excavators models to children and observing whether they dig holes in the sandbox faster than their shovel-equipped friends.

Best LLMs are too slow and/or expensive to use in Copilot fashion just yet. I'm not sure if it's even a good idea - Copilot-like use breaks flow. Instead, the biggest wins coming from LLMs are from discussing problems, generating blocks of code, refactoring, unstructured to structured data conversion, identifying issues from build or debugger output, etc. All of those uses require qualitatively more "intelligence" than Copilot-style, and LLMs like GPT-4o and Claude 3.5 Sonnet deliver (hell, anything past GPT 3.5 delivered).

Thirdly, I have some doubts about the very metrics used. I'll refrain from assuming the study is plain wrong here until I read it (see [0]), but anecdotally, I can tell you that at my last workplace, you likely wouldn't be able to tell whether or not using LLMs the right way (much less Copilot) helped by looking solely at those metrics - almost all PRs were approved by reviewers with minor or tangential commentary (thanks to culture of testing locally first, and not writing shit code in the first place), but then would spend days waiting to be merged due to shit CI system (overloaded to the point of breakage - apparently all the "developer time is more expensive than hardware" talk ends when it comes to adding compute to CI bots).

--

[0] - Per the article you linked; I'm yet to find and read the actual study itself.


Do you have a link? I'm not finding it by searching.


I really need the source of this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: