Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> How can anyone intellectually honest not see that?

The idea that they can only solve problems that they've seen before in their training data is one of these things that seems obviously true, but doesn't hold up once you consistently use them to solve new problems over time.

If you won't accept my anecdotal stories about this, consider the fact that both Gemini and OpenAI got gold medal level performance in two extremely well regarded academic competitions this year: the International Math Olympiad (IMO) and the International Collegiate Programming Contest (ICPC).

This is notable because both of those contests have brand new challenges created for them that have never been published before. They cannot be in the training data already!



> consider the fact that both Gemini and OpenAI got gold medal level performance

Yet ChatGPT 5 imagines API functions that are not there and cannot figure out basic solutions even when pointed to the original source code of libraries on GitHub.


Which is why you run it in a coding agent loop using something like Codex CLI - then it doesn't matter if it imagines a non-existent function because it will correct itself when it tries to run the code.

Can you expand on "cannot figure out basic solutions even when pointed to the original source code of libraries on GitHub"? I have it do that all the time and it works really well for me (at least with modern "reasoning" models like GPT-5 and Claude 4.)


As a human, I sometimes write code that does not compile first try. This does not mean that I am stupid, only that I can make mistakes. And the development process has guardrails against me making mistakes, namely, running the compiler.


Agreed

Infallibility is an unrealistic bar to mark LLMs against


Yes. I don't see why these have to be mutually exclusive.


I feel they are mutually inclusive! I don’t think you can meaningfully create new things if you must always be 100% factually correct, because you might not know what correct is for the new thing.


> If you won't accept my anecdotal stories about this, consider the fact that both Gemini and OpenAI got gold medal level performance in two extremely well regarded academic competitions this year: the International Math Olympiad (IMO) and the International Collegiate Programming Contest (ICPC).

it's not a fair comparison

the competitions for humans are a display of ingenuity and intelligence because of the limited resources available to them

meanwhile for the "AI", all it does is demonstrate is that if you have a dozen billion dollar data-centres and a couple of hundred gigawatt hours, which can dedicate to brute-forcing a solution, then you can maybe match the level of one 18 year old, when you have a problem with a specific well known solution

(to be fair, a smart 18 year old)

and short of moores law lasting another 30 years, you won't be getting this from the dogshit LLMs on shatgpt.com


Google already released the Gemini 2.5 Deep Think model they used in ICPC as part of their $250/month "Ultra" plan.

The trend with all of these models is for the price for the same capabilities to drop rapidly - GPT-3 three years ago was over 1,000x the price of much better models today.

I'm not yet ready to bet against that trend holding for a while longer.


> GPT-3 three years ago was over 1,000x the price of much better models today.

right, so only another 27 years of moores law continuing left

> I'm not yet ready to bet against that trend holding for a while longer.

I wouldn't expect an industry evangelist to say otherwise


I'm a pretty bad "industry evangelist" considering I won't shut up about how prompt injection hasn't had any meaningful improvements in the last three years and I doubt that a robust solution is coming any time soon.

I expect this industry might prefer an "evangelist" who hasn't written 126 posts about that: https://simonwillison.net/tags/prompt-injection/

(And another 221 posts about ethical concerns with how this stuff works: https://simonwillison.net/tags/ai-ethics/)


you would be a lot more credible if you were honest about being an evangelist


Credibility is genuinely one of the things I care most about. What can I do to be more honest here?

(Also what do you mean here by an "evangelist"? Do you mean someone who is an unpaid fan of some of the products, or are you implying a financial relationship?)


I know this is something you care about, and I'm not your parent, but something I've often observed in conversations about technology on here, but especially around AI, is that if you say good things about something, you are an "evangelist." It's really that straightforward, and doesn't change even if you also say negative things sometimes.


In that case yeah, I'm an LLM "evangelist" (not so much other forms of generative AI - I play with image/video generation occasionally but I don't spend time telling people that they're genuinely worthwhile tools to learn). I'm also a Python evangelist, a SQLite evangelist, a vanilla JavaScript evangelist, etc etc etc.


yes, enough "concern" to provide plausible deniability


"they output strings that didn't exit before" is some hardcore, uncut cope




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: