More

par1970 · 2026-04-20T21:40:04 1776721204

Do you deny the reported bug finding capabilities, or do you deny that they are dangerous?

par1970 · 2026-04-17T03:58:49 1776398329

I think this is roughly solved. Tell the agent to do all of its calculations in python.

delfinom · 2026-04-17T14:47:31 1776437251

...OR if you are developing a PCB, you have the design data, and pick and place data and the gerber data.

Any combination of such gives you positions of everything within micrometers.

This is not a new problem. Testing of PCBs has been solved a billion times over and the world has had bed of nails tester and flying probe testers for 4 decades old.

We have a 8 finger flying probe machine at our facility that literally all we do is load the board in, load in the design data. It identifies points of interests, learns the fiducials and we let it do a characterization run. We then have engineering review of the resulting data and just let it fly afterwards.

None of this requires AI.

But nowadays any linear regression qualifies as AI so imma go slap a label on it.

par1970 · 2026-04-16T11:29:01 1776338941

Do you have a defense of why human-hammer-nail is a good analogy for human-chatgpt5.4-pwndsamsung?

BLKNSLVR · 2026-04-16T11:44:59 1776339899

AI without a suitably well crafted prompt is like a firework tube held by a 3 year old.

AI without a prompt is a hammer sitting in a drawer.

par1970 · 2026-04-04T19:57:02 1775332622

But the service also tells criminals and adversaries about the bomb locations.

tptacek · 2026-04-04T19:58:36 1775332716

And? So do a variety of other services. Was it your impression that the criminals and adversaries were behind the 8 ball on this?

AI is reviving debates about vulnerability research that we thought we killed off in the 1990s.

tosti · 2026-04-06T13:52:11 1775483531

Perhaps the argument isn't about the ethics of security research, but rather the divide between those who can afford non-free software licenses and those who ethically or circumstancially can't.

tptacek · 2026-04-06T14:05:34 1775484334

You'd see the same thing in 1990s full-disclosure debates, where people trying to create a social/cultural argument against vulnerability research would throw this kind of stuff against the wall just to see what would stick. It's either good to know about vulnerabilities in the code you rely on or it isn't.

tosti · 2026-04-06T18:49:11 1775501351

Yes, of course. It's a bloody shame some of those tools are inaccessible to the poor, the not poor but f* your stupid payment system that doesn't connect to my bank, the software freedom enthousiasts, possibly others.

For myself, software freedom isn't just an ethical issue but also a practical neccesity.

par1970 · 2026-04-02T18:42:34 1775155354

> What do you mean "a priori understanding codebases"?

I took him to be distinguishing between (1) just reading the code/docs and reasoning about it, and (2) that + crafting and running tests.

tptacek · 2026-04-02T19:25:11 1775157911

I don't think that's it; both reading the code and running tests are a posteriori capabilities.

sigbottle · 2026-04-02T20:52:09 1775163129

No you're right. I initially thought you were wrong but it is sus.

My intuition for a priori cut something along the lines of, "Even if you had the entire source code in your head at once, there's limits to reasoning about it". Computability is one hard result. You also have to interact with the real world on a wide variety of hardware systems, or even just a wide variety of systems if you create an API - how do you reason past the abstraction boundary reliably without actually having tests and interacting with systems and getting feedback? Not really possible unless LLM's control everything. More philosophical questions (such as "is our 'correct' actually the right thing?") we grant the easy case that everybody's in consensus - the "easier" problems show up either way.

But getting to the point of "understanding in principle every piece of linux" is pretty undefined and practically doesn't seem possible for a singular LLM or a human. This also seems really hairy for smuggling in whatever implicit premises you want to swing the issue either way.

But personally I (and many other people) have seen late 2025 models get extremely good, and that precisely is because they actually started doing deep tooling and like, actually running and testing their code. I was not getting nearly as much value out of them (still a decent amount of value!) prior to the tooling explosion, not even MCPs were good. It was when they actually started aggressively spawning subshells and executing live tests. But I guess using a priori/posterioi isn't really a useful split here?

par1970 · 2026-04-02T20:41:54 1775162514

Yeah, maybe you are right. But is doing math and reasoning about Turing machines a priori? If so, then it seems plausible to me that reasoning about a codebase (without running it) is also ‘a priori’.

par1970 · 2026-03-31T19:55:36 1774986936

par1970 · 2026-03-19T22:12:35 1773958355

How much domain experience do you have? Is it helping you solve problems for paying customers?

yibers · 2026-03-20T04:52:49 1773982369

I have plenty of domain experience but I won't define myself as an expert. It helped me solve real business problems.

par1970 · 2026-03-19T21:38:58 1773956338

If your project requires the solution of a tricky algorithmic issue, then is the AI system able to solve that part, or do you have to give it the solution?

valentinconan · 2026-03-19T22:20:12 1773958812

I haven't yet tried to solve truly complex algorithmic problems.

Generally speaking, if the problem is common, the model has likely already been trained to solve it.

If it's truly complex and/or specific to my needs, I can try using a reasoning model to think through a solution before moving on to implementation.

I use the agent to conduct research, find resources to understand the complexity, best practices, feedback, etc., and to write a Markdown analysis file on the topic.

Then I can use this file as a basis to precisely define what I want to do and brainstorm with the agent in thinking mode. The more the task is described and defined, the more accurate the result will be.

par1970 · 2026-03-19T21:16:03 1773954963

What models + versions are you using?

Is it bad at designing systems that don't have a bunch of integrations?

par1970 · 2026-03-19T21:07:29 1773954449

> I don't use ChatGPT, but i've been using an agent with Claude Sonnet 4.

Are you using Sonnet 4.6?

> So this AI Agent... It is much faster at doing code when given specific instructions. But it keeps loosing context on architecture, and i cant really let it build complex things with interdependencies that build on each other.

I've only built small things (< 1000 lines) with the systems, so I might be missing this problem.

Is it better than you at building small self-contained things?

> And i get a bad feel when i then wonder how is this app doing what it does? because my agent cant explain it, and i would be stupid to believe what it hallucinated because it sounds really solid until you scratch the construction.

Do you ask it to generate test suites for the things that it builds?

> it would be also faster to build a catastrophic spaghetti code nightmare if not used with great care.

noted

multidude · 2026-03-19T21:21:27 1773955287

i started working with this two weeks ago, so im learning as i go (or should i say stumble and fall). Weird as it may sound what i found so trustworthy at the beginning, it sounded so rational and logic as it really knew better and i liked letting it do. Obviously it dis not go so well, and i had to correct a lot. But i am learning, what can i say? And yes, i gave it many commandements like "thouh shalt always test before releasing" and it sounded so convincing when it confirmed what an excellent idea that was that i was surprised at least -imagine that- when something did not go as planned on prod because of , well you know...

par1970 · 2026-03-19T22:13:32 1773958412

Did you tell it that it should test, or did you have it generate actual tests that you could run if you wanted to?