Hacker Newsnew | past | comments | ask | show | jobs | submit | fnands's commentslogin

Some of those pictures are delightfully cursed

Yeah, either that or the commoditization of LLMs.

I.e. in the early days (2023-2025, lol), companies were mostly differentiated by the quality of their LLM, i.e. GPT-3.5 was leagues ahead of anyone else, so was for serious work the only game in town. Until recently it was basically only the US based companies (OpenAI/Anthropic) with models worth using.

Now, Anthropic and OpenAI are neck-and-neck, and Chinese models are catching up fast, and the models are becoming commodities, and as long as they reach some threshold of quality/capability, it doesn't really matter to most customers.

So now we might be reaching the branding stage of LLMs. I.e. do you go for Anthropic, who are branding themselves as the responsible ones, or OpenAI, who are branding themselves as... the innovators? Not sure what they are going for these days.

If all models are roughly equally capable, all that's left is branding.

At least those are the thoughts it provoked in me. If that's what Paul Graham meant or not I have no idea.


I think you’re right about the impetus for the piece. I’ll say that branding is for things consumed in public, I expect vendor lock in before branding. But as you know this site is awash in speculation about how the lack of differentiation will play out

When was the last time you tried?

I think trying agents to do larger tasks was always very hit or miss, up to about the end of last year.

In the past couple of months I have found them to have gotten a lot better (and I'm not the only one).

My experience with what coding assistants are good for shifted from:

smart autocomplete -> targeted changes/additions -> full engineering


I’m not OP but every time I post a comment with this sentiment I get told “the latest models are what you need”. If every 3 months you are saying “it’s ready as long as you use the latest model”, then it wasn’t ready 3 months ago and it’s not likely to be ready now.

To answer your question, I’ve tried both Claude code and Antigravity in the last 2 weeks and I’m still finding them struggling. AG with Gemini regularly gets stuck on simple issues and loops until I run out of requests, and Claude still just regularly goes on wild tangents not actually solving the problem.


I don’t think that’s true. Claude Opus 4.5/4.6 in Cursor have marked the big shift for me. Before that, agentic development mostly made me want to just do it myself, because it was getting stuck or going on tangents.

I think it can (and is) shifting very rapidly. Everyone is different, and I’m sure models are better at different types of work (or styles of working), but it doesn’t take much to make it too frustrating to use. Which also means it doesn’t take much to make it super useful.


> I don’t think that’s true. Claude Opus 4.5/4.6 in Cursor.

Opus 4.6 has been out for less than a month. If it was a big shift surely we'd see a massive difference over 4.5 which was november. I think this proves the point, you're not seeing seisimic shifts every 3 months and you're not even clear about which model was the fix.

> I think it can (and is) shifting very rapidly.

Shifting, maybe. But shuffling deck chairs every 3 months.


I interpreted their comment to mean 4.5 was the shift, which was nov last year. "Before that" meaning pre 4.5.

It depends on what you're handling. Frontend (not css), swagger, mundane CRUD is where it shines. Something more complex that need a bit harder calculation usually make the agents struggling.

Especially good to navigate the code if you're unfamiliar with it (the code). If you have known the code for good, you'll find it's usually faster to debug and code by yourself.

Opus 4.6 with claude code vscode extension


Agree, it’s strange, I will just assume that the people who say this are building react apps. I still have so much ”certainly, I should not do this in a completely insane way, let me fix that” … -400+2. It’s not always, and it is better than it was, but that’s it.

I'm an ML engineer, so it's mostly been setting up data processing/training code in PyTorch, if that helps.

Have you tried it with something like OpenSpec? Strangely, taking the time to lay out the steps in a large task helps immensely. It's the difference between the behavior you describe and just letting it run productively for segments of ten or fifteen minutes.

> Have you tried it with something like OpenSpec?

No. The parent comment said I needed a new model, which I've tried. Being told "just try something else aswell" kind of proves the point.


I thought this too and then I discovered plan mode. If you just prompt agent mode it will be terrible, but coming up with a plan first has really made a big difference and I rarely write code at all now

My workflow has become very plan-intensive... including planning of verification+test steps at the end.

At this point though, after Claude C Compiler, you've got to give us more details to better understand the dichotomy. What do you consider simple issues?

> At this point though, after Claude C Compiler,

Perfect example. You mean the C compiler that literally failed to compile a hello world [0] (which was given in it's readme)?

> What do you consider simple issues?

Hallucinating APIs for well documented libraries/interfaces, ignoring explicit instructions for how to do things, and making very simple logic errors in 30-100 line scripts.

As an example, I asked Claude code to help me with a Roblox game last weekend, and specifically asked it to "create a shop GUI for <X> which scales with the UI, and opens when you press E next to the character". It proceeded to create a GUI with absolute sizings, get stuck on an API hallucination for handling input, and also, when I got it unstuck, it didn't actually work.

[0] https://github.com/anthropics/claudes-c-compiler/issues/1


Excellent examples, thank you!

Shame Claude Code doesn't have sharable chat logs, it would be interesting to see where your Roblox exploration went off the rails.


I think you can use https://traces.com for that

Claude C compiler is 100k LOC that doesn’t do anything useful, and cost $20k plus the cost of an expert engineer creating a custom harness and babysitting it.

But the most important thing is that they were reverse engineering gcc by using it as an oracle. And it had gcc and thousands of other c compilers in its training set.

So if you are a large corporation looking to copy GPL code so that you can use it without worrying about the license, and the project you want to copy is a text transformer with a rigorously defined set of inputs and outputs, have at it.


> When was the last time you tried?

Pretty recently (a couple weeks ago). I give agentic workflows a go every couple of weeks or so.

I should say, I don't find them abysmal, but I tend to work in codebases where I understand them, and the patterns really well. The use cases I've tried so far, do sort of work, just not yet at least, faster than I'm able to actual write the code myself.


> My experience with what coding assistants are good for shifted from:

> smart autocomplete -> targeted changes/additions -> full engineering

Define "full engineering". Because if you say "full engineering" I would expect the agent to get some expected product output details as input and produce all by itself the right implementation for the context (i.e. company) it lives in.


I agree that "full engineering" was a bit broad. I should probably have said something like "agent-only coding"?

I.e. the point where the agent writes all the code and you just verify.


The "you just verify" part can take indeed a lot of steering and hand-holding to get the right implementation for the current company/department/project context. Otherwise you might be just generating tech debt at scale.

You'll probably be one of the best paid people in Thuringia ;-)

Don't forget to BYOB though - bring your own bananas ;)

LiveEO | Senior ML Engineer | Berlin, Germany | Hybrid | Full-time

LiveEO leverages high-resolution satellite imagery and AI to provide actionable insights across industries—like protecting power grids, monitoring critical infrastructure, and ensuring deforestation compliance.

We are looking for a Senior ML Engineer to build and scale multitemporal, multimodal computer vision models for Earth observation. You’ll combine optical and Synthetic Aperture Radar (SAR) data into robust representations. This role is a true balance of applied research and engineering: you’ll own the full ML R&D lifecycle from data standardization and SOTA model development to rigorous evaluation and production-grade delivery.

Tech Stack: Python, PyTorch/Lightning, Databricks, MLflow, Ray, Prefect, AWS, Geospatial stack (GDAL, Rasterio, GeoPandas, STAC), PostgreSQL.

What we're looking for:

* Strong Python engineering and deep PyTorch/Lightning experience.

* Proven experience implementing and training deep learning models at scale.

* Hands-on experience with satellite imagery (optical & SAR strongly preferred).

* Strong CV fundamentals (representation learning, supervision, evaluation) and ML experimentation (Databricks/MLflow).

* Pragmatic mindset: you can take SOTA papers to validated baselines and production under real-world constraints.

Must be living in or willing to relocate to Berlin. This role requires German/European citizenship due to legal/regulatory requirements.

(Bonus points for experience with large-scale geospatial foundation models, VLMs, or distributed compute with Ray).

Apply here: https://liveeo-gmbh.jobs.personio.de/job/2540514


The link seems to have the wrong text: * Proven leadership in security for modern cloud-native systems and products.

Did you copy paste a security position?


thanks for flagging! It's fixed

Oh hey, the man himself!

I was looking at the zlib-ng crc32 implementation which is where I saw that it was recently updated to include your algorithm.

Good work, it's a surprisingly elegant solution when compared to the braiding approaches!


thanks :) the braiding approach is super clever too, this was one of those weird moments where you find something and then have to triple check your results because how could i accidentally find something better than the algorithm that hasn't been touched in decades...

the part i really like is that it gives us small improvement on the pclmul too, as the non-accelerated algorithm doesn't really stand a chance against the accelerated opcode on newer hardware so it probably isn't going to see much use in practice. however... i think hardware solutions could possibly benefit (e.g. ethernet cards)


Apparently that's kinda where the name comes from. It's named after a Serbian musician who was known as Bora Čorba, who played for a band called Riblja Čorba (fish stew).

News to me, but a guy named Sam Russell came up with a new software only CRC32 algorithm that is competitive with hardware accelerated implementations. It's a surprisingly elegant solution.

Yeah, I think it's a tough one for some people. Case in point: my parents.

My father has always had a million hobbies, and his work was what was preventing him from fully exploring them. He's taken to retirement like a fish to water.

My mother on the other hand (still working at 73) like most academics has been very dedicated to her work, and her main hobby outside of work has been hiking.

I'm a little worried that she'll struggle a bit to adapt to retirement.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: