> How can anyone intellectually honest not see that? The idea that they can only...

throw219080123 · 2025-10-08T12:20:11 1759926011

> consider the fact that both Gemini and OpenAI got gold medal level performance

Yet ChatGPT 5 imagines API functions that are not there and cannot figure out basic solutions even when pointed to the original source code of libraries on GitHub.

simonw · 2025-10-08T13:11:42 1759929102

Which is why you run it in a coding agent loop using something like Codex CLI - then it doesn't matter if it imagines a non-existent function because it will correct itself when it tries to run the code.

Can you expand on "cannot figure out basic solutions even when pointed to the original source code of libraries on GitHub"? I have it do that all the time and it works really well for me (at least with modern "reasoning" models like GPT-5 and Claude 4.)

steveklabnik · 2025-10-08T17:31:45 1759944705

As a human, I sometimes write code that does not compile first try. This does not mean that I am stupid, only that I can make mistakes. And the development process has guardrails against me making mistakes, namely, running the compiler.

alickz · 2025-10-08T19:54:13 1759953253

Agreed

Infallibility is an unrealistic bar to mark LLMs against

orbital-decay · 2025-10-08T14:15:26 1759932926

Yes. I don't see why these have to be mutually exclusive.

buildbot · 2025-10-08T17:09:01 1759943341

I feel they are mutually inclusive! I don’t think you can meaningfully create new things if you must always be 100% factually correct, because you might not know what correct is for the new thing.

blibble · 2025-10-08T13:08:15 1759928895

> If you won't accept my anecdotal stories about this, consider the fact that both Gemini and OpenAI got gold medal level performance in two extremely well regarded academic competitions this year: the International Math Olympiad (IMO) and the International Collegiate Programming Contest (ICPC).

it's not a fair comparison

the competitions for humans are a display of ingenuity and intelligence because of the limited resources available to them

meanwhile for the "AI", all it does is demonstrate is that if you have a dozen billion dollar data-centres and a couple of hundred gigawatt hours, which can dedicate to brute-forcing a solution, then you can maybe match the level of one 18 year old, when you have a problem with a specific well known solution

(to be fair, a smart 18 year old)

and short of moores law lasting another 30 years, you won't be getting this from the dogshit LLMs on shatgpt.com

simonw · 2025-10-08T13:39:20 1759930760

Google already released the Gemini 2.5 Deep Think model they used in ICPC as part of their $250/month "Ultra" plan.

The trend with all of these models is for the price for the same capabilities to drop rapidly - GPT-3 three years ago was over 1,000x the price of much better models today.

I'm not yet ready to bet against that trend holding for a while longer.

blibble · 2025-10-08T13:44:18 1759931058

> GPT-3 three years ago was over 1,000x the price of much better models today.

right, so only another 27 years of moores law continuing left

> I'm not yet ready to bet against that trend holding for a while longer.

I wouldn't expect an industry evangelist to say otherwise

simonw · 2025-10-08T13:53:44 1759931624

I'm a pretty bad "industry evangelist" considering I won't shut up about how prompt injection hasn't had any meaningful improvements in the last three years and I doubt that a robust solution is coming any time soon.

I expect this industry might prefer an "evangelist" who hasn't written 126 posts about that: https://simonwillison.net/tags/prompt-injection/

(And another 221 posts about ethical concerns with how this stuff works: https://simonwillison.net/tags/ai-ethics/)

hitarpetar · 2025-10-08T15:04:04 1759935844

you would be a lot more credible if you were honest about being an evangelist

simonw · 2025-10-08T15:19:39 1759936779

Credibility is genuinely one of the things I care most about. What can I do to be more honest here?

(Also what do you mean here by an "evangelist"? Do you mean someone who is an unpaid fan of some of the products, or are you implying a financial relationship?)

steveklabnik · 2025-10-08T17:33:54 1759944834

I know this is something you care about, and I'm not your parent, but something I've often observed in conversations about technology on here, but especially around AI, is that if you say good things about something, you are an "evangelist." It's really that straightforward, and doesn't change even if you also say negative things sometimes.

simonw · 2025-10-08T17:36:44 1759945004

In that case yeah, I'm an LLM "evangelist" (not so much other forms of generative AI - I play with image/video generation occasionally but I don't spend time telling people that they're genuinely worthwhile tools to learn). I'm also a Python evangelist, a SQLite evangelist, a vanilla JavaScript evangelist, etc etc etc.

blibble · 2025-10-08T14:26:56 1759933616

yes, enough "concern" to provide plausible deniability

numismatically · 2025-10-08T21:07:16 1759957636

"they output strings that didn't exit before" is some hardcore, uncut cope