Very well put. The trick is to do either of the following:
1. Find simpler tasks for which the trust in LLMs is high.
2. Give tasks to the LLMs that have a very low cost to verify (even when the task is not simple) - particularly one off scripts.
I once had a colleague who was in the "not trust" bucket for the work we were doing. So we found something he was good at that was a pain for me to do, and re-assigned him to do those things and take that burden off of us.
In the last few months I've had the LLM solve (simple) problems via code that had been in my head for years. At any point I could have done them, but they were a chore. If the LLM failed for one of these tasks - it's not a big deal - not much time was lost. But they tend to succeed fairly often, because they are simple tasks.
I almost never let the LLM write production code, because of the extra burden that you and others allude to. But I do let it write code I rely on in my personal life, because frankly I tend to write pretty poor code for my personal use - I can't justify the time it would take to write things well - life is too busy. I welcome the code quality I get from Sonnet or Gemini 2.5 Pro.
That's my point in this thread. Writing code is a pretty diverse discipline, and many are dismissing it simply because it doesn't do one particular use case (high quality production code) well.
I didn't take LLM coding seriously until I found well respected, well known SW engineers speak positively about them. Then I tried it and ... oh wow. People dismissing them is dismissing not only a lot of average developers' reality, but also a lot of experts' daily reality.
He used an LLM to find a security vulnerability in the kernel. To quote him:
> Before I get into the technical details, the main takeaway from this post is this: with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention. If you’re an expert-level vulnerability researcher or exploit developer the machines aren’t about to replace you. In fact, it is quite the opposite: they are now at a stage where they can make you significantly more efficient and effective. If you have a problem that can be represented in fewer than 10k lines of code there is a reasonable chance o3 can either solve it, or help you solve it.
1. Find simpler tasks for which the trust in LLMs is high.
2. Give tasks to the LLMs that have a very low cost to verify (even when the task is not simple) - particularly one off scripts.
I once had a colleague who was in the "not trust" bucket for the work we were doing. So we found something he was good at that was a pain for me to do, and re-assigned him to do those things and take that burden off of us.
In the last few months I've had the LLM solve (simple) problems via code that had been in my head for years. At any point I could have done them, but they were a chore. If the LLM failed for one of these tasks - it's not a big deal - not much time was lost. But they tend to succeed fairly often, because they are simple tasks.
I almost never let the LLM write production code, because of the extra burden that you and others allude to. But I do let it write code I rely on in my personal life, because frankly I tend to write pretty poor code for my personal use - I can't justify the time it would take to write things well - life is too busy. I welcome the code quality I get from Sonnet or Gemini 2.5 Pro.
That's my point in this thread. Writing code is a pretty diverse discipline, and many are dismissing it simply because it doesn't do one particular use case (high quality production code) well.
I didn't take LLM coding seriously until I found well respected, well known SW engineers speak positively about them. Then I tried it and ... oh wow. People dismissing them is dismissing not only a lot of average developers' reality, but also a lot of experts' daily reality.
Just look at the other submission:
https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...
He used an LLM to find a security vulnerability in the kernel. To quote him:
> Before I get into the technical details, the main takeaway from this post is this: with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention. If you’re an expert-level vulnerability researcher or exploit developer the machines aren’t about to replace you. In fact, it is quite the opposite: they are now at a stage where they can make you significantly more efficient and effective. If you have a problem that can be represented in fewer than 10k lines of code there is a reasonable chance o3 can either solve it, or help you solve it.