> but it's increasingly looking like LeCunn is right. This is an absolutely craz...

D-Machine · 2026-01-03T02:12:27 1767406347

If you are paying attention to actual research, guarded benchmarks, and understand how benchmarks are being gamed, I would say there is plenty of evidence we are approaching a clear plateau / the march-of-nines thesis of Karpathy is basically correct long-term. Short-term it remains to be seen how much more we can do with the current tech.

gbnwl · 2026-01-03T02:32:47 1767407567

Can you point me to some of the actual research you're talking about? I'd love to read.

D-Machine · 2026-01-03T04:08:19 1767413299

Your best bet would be to look deeply into performance on ARC-AGI fully-private test set performances (e.g. https://arcprize.org/blog/arc-prize-2025-results-analysis), and think carefully about the discrepancies here, or, just to broadly read any academic research on classic benchmarks and note the plateaus on classic datasets.

It is very clear when you look at academic papers actually targeting problems specific to reasoning / intelligence (e.g. rotation invariance in images, adversarial robustness) that all the big companies are doing is just fitting more data / spending more resources on human raters and other things to boost performance on (open) metrics, but that clear actual gains in genuine intelligence are being made only by milking what we know very well to be a limited approach. I.e. there are trivially-basic problems that cannot be solved by curve-fitting models, which makes it clear most current advances are indeed coming from curve(manifold) fitting. It just isn't clear how far we can exploit these current approaches and in what domains this kind of exploitation is more than good enough.

EDIT: Are people unaware Google Scholar is a thing? It is trivial to find modern AI papers that can be read without requiring access to a research institution. And e.g. HuggingFace collects trending papers (https://huggingface.co/papers/trending), and etc.

jk2444 · 2026-01-03T04:21:08 1767414068

At present its only SWE's that are benefitting from a productivity stand point. I know a lot of people in finance (from accounting to portfolio management) and they scoff at the outputs of LLMs in their day to day jobs.

But the bizarre thing is, even though the productivity of SWE's is increasing I dont believe there will be much happening in regards to lay offs due to the fact that there isn't complete trust in LLMs; I dont see this changing either. In which case the LLM producers will need to figure out a way to increase the value of LLMs and get users to pay more.

Ianjit · 2026-01-03T05:02:39 1767416559

Are SWE’s really experiencing a productivity uplift? When studies attempt to measure the productivity impact of AI in software the results I have seen are underwhelming compared to the frontier labs marketing.

D-Machine · 2026-01-03T05:10:34 1767417034

This too should be questioned, at least a couple studies at this point suggesting many feel like they are going faster with AI when, by some metrics, they are going slower (e.g. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...), and then there are e.g. admissions from major CEOs publicly admitting e.g. Copilot doesn't "really work" (https://ppc.land/microsoft-ceo-admits-copilot-integrations-d...).

And, again, this is ignoring all the technical debt of produced code that is poorly understood, weakly-reviewed, and of questionable quality overall.

I still think this all has serious potential for net benefit, and does now in certain cases. But we need to be clearer about spelling out where that is (webshit, boilerplate, language-to-language translation, etc) and where it maybe isn't (research code, legacy code, large codebases, niche/expert domains).

Ianjit · 2026-01-04T00:29:36 1767486576

This Stanford study on developer productivity found 0 correlation between developers assessment of their own productivity and independent measures of their productivity. Any anecdotal evidence from developers on how AI has made them more or less productive is worthless.

https://youtu.be/tbDDYKRFjhk?si=gF4EN4ilogoam3hG

D-Machine · 2026-01-04T03:42:00 1767498120

Agreed.

tiahura · 2026-01-05T17:21:20 1767633680

Lawyer here. AI has taken over my workflow.

D-Machine · 2026-01-03T04:29:25 1767414565

Yup, most progress is also confined to SWE's doing webshit / writing boilerplate code too. Anything specialized, LLMs are rarely useful, and this is all ignoring the future technical debt of debugging LLM code.

I am hopeful about LLMs for SWE, but the progress is currently contextual.

jk2444 · 2026-01-03T04:37:27 1767415047

Agreed.

Even if LLMs could write great code with no human oversight, the world would not change over night. Human creativity is necessary to figure out what stuff to produce that will yield incremental benefits to what already exists.

The humans who possess such capability stand to win long-term; said humans tend to be those from the humanities and liberal arts.

catigula · 2026-01-03T08:18:46 1767428326

You're going to be eating so much crow shortly.