Have you tried it? More than once? I’m getting massive productivity gains with C...

stefan_ · 2025-05-05T22:02:55 1746482575

I use it all the time, multiple times daily. But the discussion is not being very honest, particularly for all the things that are being bolted on (agent mode, MCP). Like just upstream people dunk on others for pointing out that maybe giving the model an API call to read webpages isn't quite turning LLM into search engines. Just like letting it run shell commands has not made it into a full blown agent engineer.

I tried it again just now with Claude 3.7 in Cursors Agent/Compose (they change this stuff weekly). Write a simple C++ TensorRT app that loads an engine and runs inference 100 times for a benchmark, use this file to source a toolchain. It generated code with the old API & a CMake file and (warning light turns on) a build script. The compile fails because of the old API, but this time it managed to fix it to use the new API.

But now the linking fails, because it overwrote the TRT/CUDA directories in the CMakeLists with some home cooked logic (there was nothing to do, the toolchain script sets up the environment fully and just find_package would work).

And this is where we go off the rails; it messes with the build script and CMakeLists more, but still it can not link. It thinks hey it looks like we are cross-compiling and creates a second build script "cross-compile.sh" that tries to use the compiler directly, but of course that misses things that the find_package in CMake would setup and so fails with include errors.

It pretends its a 1970 ./configure script and creates source files "test_nvinfer.cpp" and "test_cudart.cpp" that are supposed to test for the presence of those libraries, then tries to compile them directly; again its missing directories and obviously fails.

Next we create a mashup build script "cross-compile-direct.sh". Not sure anymore what this one tried to achieve, didn't work.

Finally, and this is my favorite agent action yet, it decides fuck it, if the library won't link, why don't we just mock out all the actual TensorRT/CUDA functionality and print fake benchmark numbers to demonstrate LLMs can average a number in C++. So it writes, builds ands runs a "benchmark_mock.cpp" that subs out all the useful functionality for random data from std::mt19937. This naturally works, so the agent declares success and happily updates the README.md with all the crap it added and stops.

This is what running the lawnmower over the flower bed means; you have 5 more useless source files and a bunch more shell scripts and a bunch of crap in a README that were all generated to try and fail to fix a problem it could not figure out, and this loop can keep going and generate more nonsense ad infinitum.

(Why could it not figure out the linking error? We come back to the shitty bolted on integrations; it doesn't actually query the environment, search for files or look at what link directories are being used, as one would investigating a linking error. It could of course, but the balance in these integrations is 99% LLM and 1% tool use, and even context from the tool use often doesn't help)

tptacek · 2025-05-05T23:40:23 1746488423

It's really weird for me to see people talking about using LLMs in coding situations in a frame where "agents" (we're not even at MCP yet!) are somehow an extra. People discussing the applicability of LLMs to programming, and drawing conclusions (even if only for themselves) about how well it works, should be experienced with a coding agent.

mountainriver · 2025-05-06T17:24:50 1746552290

LLMs aren't good at deep ML problems yet. We know this, its what MLE-bench is for. That doesn't mean they aren't good at other coding problems