More

TZubiri · 2026-02-04T00:17:17 1770164237

It's not clear whether you are asking a question, proposing a new standard, or affirming an existing convention.

TZubiri · 2026-02-03T20:47:42 1770151662

Why would X have offices in France? I'm assuming it's just to hire French workers? Probably leftover from the Pre Acquisition era.

Or is there any France-specific compliance that must be done in order to operate in that country?

mike-the-mikado · 2026-02-03T20:58:00 1770152280

X makes its money selling advertising. France is the obvious place to have an office selling advertising to a large European French-speaking audience.

joshuaheard · 2026-02-03T22:43:53 1770158633

Yes, Paris is an international capital and centrally located for Europe, the Middle East, and Africa. Many tech companies have sales offices there.

TZubiri · 2026-02-02T19:21:02 1770060062

Nowadays there's a lot of FUTON bias in research. There's so much power in just hitting the streets or reaching out to your circle.

For the most part, you care the most about your circle, so if that isn't representative of the whole of society, it sounds like somebody else's problem. Who said all research needed to be perfect.

sien · 2026-02-03T03:01:37 1770087697

To explain this for anyone else like me who hadn't heard the term.

https://en.wikipedia.org/wiki/Open_access_citation_advantage

Full Text On the Net = FUTON.

TZubiri · 2026-02-02T19:07:18 1770059238

So this website is filled mostly with tech founder adjacent users, so my pitch is a bit specific to that market.

If you are a startup founder trying to develop a global product, I'm offering country-specific localization (Argentina currently), everything from language and dialect localization, to integration of local methods and support channels to payment/support aggregators.

Want to dominate the world? Try bottom up, perfecting a single market, instead of top down, aiming everywhere and seeing what sticks.

Why not focus on the US? You can try both for sure, but a smaller country is less competitive and easier to get a big win on (niche domination). You won't make as much money, GDP per capita is about one third of the US, but it's still profit, not VC-backed losses.

I work on a small fixed fee + commission basis. The goal is to only hit it big if you do too, and live on rice otherwise.

If that's sounds interesting, you can send me an email at localization@tomaszubiri.com, sure, but ideally you can write me a message in this thread directly.

  Location: Argentina
  Remote: Yes
  Willing to relocate: No, can travel on B1, but strictly no work.
  Technologies: Backend/Sysadmin heavy, CPython mainly, can read and modify C#/Java/Node, and JS/TS/React frontends. Linux VM/Container based hosting. PSQL. Raw OpenAI API.
  Resume/CV: https://drive.google.com/file/d/1YtGhgZOstKibmWzyibCbK1ZBN3n1GCZr/view?usp=sharing
  Email: localization@tomaszubiri.com

TZubiri · 2026-02-02T18:39:25 1770057565

Are you calling OpenAI a random vendor?

That's like calling Coca Cola a random beverage vendor

miroljub · 2026-02-03T10:30:20 1770114620

Yes, OpenAI is a random development tool vendor. In the same way Volkswagen is a random sausage vendor.

Do you drink your Coca Cola directly from the Coca Cola packaged bottle?

Or do you prefer to sip it in the cup of your choice and drink it from there? The same cup you use to drink Pepsi, Fanta, milk, and other beverages.

TZubiri · 2026-02-03T19:49:11 1770148151

Ah, got it, you mean in the "development tools" category.

I'd say that such a category, first is very small, and second there's almost no companies that exclusively offer development tools (JetBrains?). It's a product category where the competition is either individuals/OSS/academic or tech companies that as a side quest release their dev tech.

For the even more specific tool of "LLM development tools" or "agentic coding" OpenAI is the first of its kind, the term VibeCoding emerged from a Karpathy tweet one year ago, and back it was just ChatGPT through the Chat Interface.

It wasn't a coincidence either, they explicitly train their models on code production, mainly out of a need to do useful tool calls, and to do even simple tasks like multiplying a couple of numbers, but it grew into its own product category starting with the supervised Cursor,Windsurf, then the autonomous Devin, then back to supervised with Claude Code/Codex

So yeah, I wouldn't say it's a random vendor in that narrow sense of the specific product. But I get that it's a random vendor if you zoom out and think of a "development tools" category. It's subjective, but I think the nascent field that's clearly changing and hitting Trillion dollar market size is a bit more important than a field that only ever ever had a single company at all in that field (JetBrains?).

TZubiri · 2026-02-02T17:41:08 1770054068

I don't think anyone is sleeping on it.

It's on the top of most leaderboards on lmarena.ai

TZubiri · 2026-02-02T17:14:26 1770052466

Friendship ended with OpenAI, Now Anthropic is my best friend

endemic · 2026-02-02T21:32:43 1770067963

Best friendship takes place!

https://www.folklore.org/I'll_Be_Your_Best_Friend.html

TZubiri · 2026-02-02T22:56:46 1770073006

Learning about tech folklore is the best part of Hacker News, there's stuff you can't learn from books or tutorials (well maybe you could, but you are unlikely to reach it on your own.)

disqard · 2026-02-02T18:37:38 1770057458

Is this the "beginning of the end" for OpenAI?

chasd00 · 2026-02-02T20:41:47 1770064907

That and if Nvidia backs out of their $100B promise it may not be the death knell but it would certainly by a step backward for OpenAI.

https://www.wsj.com/tech/ai/the-100-billion-megadeal-between...

TZubiri · 2026-02-02T21:28:40 1770067720

Yes the product's secret sauce is out and it's becomming a commodity.

But OpenAI is still innovating with new subcategories, and even in cases where it did not innovate (Claude Code came first and OpenAI responded with Codex), it outdoes its competitors. Codex is being widely preferred by the most popular vibecode devs, notably Moltbook's dev, but also Jess Fraz.

In terms of pricing, OAI holds by far the most expensive product so it's still positioned as a quality option, to give an example, most providers have a 3 tier price for API calls.

Anthropic has 1$/3$/5$ (per output MTokens) Gemini has 3$/12$ (2tier) OpenAI has 2$/14$/168$

So the competitors are mainly competing in price in the API category

To give another datapoint, Google just released multimodal (image input) models like 1 or 2 months ago. This has been in ChatGPT for almost a year now

TZubiri · 2026-02-01T05:28:24 1769923704

The background is alternating between cyan and black, which is very distracting. Not sure if that's on purpose.

TZubiri · 2026-02-01T02:29:07 1769912947

I use --dry-run when I'm coding and I control the code.

Otherwise it's not very wise to trust the application on what should be a deputy responsibility.

Nowadays I'd probably use OverlayFS (or just Docker) to see what the changes would be, without ever risking the original FS.

throwaway290 · 2026-02-01T04:12:16 1769919136

How do you easily diff what changed between Docker and host?

TZubiri · 2026-02-01T18:49:16 1769971756

The way OverlayFS works is that there's a base directory. And then there's an overlay directory that only contains the changes. Docker is based on OverlayFS.

There's two main ways overlays are used, first at build time, each line/command generates a new overlay based on the previous base, so when you do something like

FROM debian RUN apt-get update

it creates a base from the debian image , and then creates an overlay that only contains the changes introduced by apt-get update.

If you use docker inspect or docker show on the image you get a json showing exactly where the overlay directories are, you just need to navigate the overlay directory.

Second: on runtime. [Assuming you are not using volumes, (and if you use volumes, just make sure the volume starts out as empty, instead of sharing your host files)] OverlayFS is used for the runtime file changes as well, the last image is used as a base, and every files changed during runtime are added to the runtime overlay. That filesystem won't be deleted, if you only stop the docker container, the runtime files will still be present, and you can reach them by docker inspecting the running docker processes/instances, and then navigating the overlay fs as you would any directory.

You can also just use overlayfs, as far as I recall, you just use mount and unmount while specifying the OverlayFS driver and special parameters like base and overlay. Conjugating a chain of overlays is a bit more complex, but it's the same interface.

throwaway290 · 2026-02-02T10:33:25 1770028405

Thanks!

TZubiri · 2026-01-31T21:28:21 1769894901

With all due respect to Stallman, you can actually study binaries.

The claim Stallman would make (after punishing you for using Open Source instead of Free Software for an hour) is that Closed Software (Proprietary Software) is unjust. but in the context of security, the claim would be limited to Free Software being capable of being secure too.

You may be able to argue that Open Source reduces risk in threat models where the manufacturer is the attacker, but in any other threat model, security is an advantage of closed source. It's automatic obfuscation.

There's a lot of advantages to Free Software, you don't need to make up some.

sigmoid10 · 2026-01-31T21:39:45 1769895585

This. Closed source doesn't stop people from finding exploits in the same way that open source doesn't magically make people find them. The Windows kernel is proprietary and closed source, but people constantly find exploits in it anyways. What matters is that there is a large audience that cares about auditing. OTOH if Microsoft really wanted to sneak in a super hard to detect spyware exploit, they probably could - but so could the Linux kernel devs. Some exploits have been openly sitting in the Linux kernel for more than a decade despite everyone being able to audit it in theory. Who's to say they weren't planted by some three letter agency who coerced a developer. Relying on either approach is pointless anyways. IT security is not a single means to all ends. It's a constant struggle between safety and usability at every single level from raw silicon all the way to user-land.

tptacek · 2026-01-31T21:56:26 1769896586

It's weird to me that it's 2026 and this is still a controversial argument. Deep, tricky memory corruption exploit development is done on closed-source targets, routinely, and the kind of backdoor/bugdoor people conjure in threads about E2EE are much simpler than those bugs.

It was a pretty much settled argument 10 years ago, even before the era of LLVM lifters, but post-LLM the standard of care practice is often full recompilation and execution.

objclxt · 2026-01-31T21:53:56 1769896436

> in any other threat model, security is an advantage of closed source

I think there's a lot of historical evidence that doesn't support this position. For instance, Internet Explorer was generally agreed by all to be a much weaker product from a security perspective than its open source competitors (Gecko, WebKit, etc).

Nobody was defending IE from a security perspective because it was closed source.

refulgentis · 2026-01-31T21:31:06 1769895066

This comment comes across as unnecessarily aggressive and out of nowhere (Stallman?), it's really hard to parse.

Does this rewording reflect it's meaning?

"You don't actually need code to evaluate security, you can analyze a binary just as well."

Because that doesn't sound correct?

But that's just my first pass, at a high level. Don't wanna overinterpret until I'm on surer ground about what the dispute is. (i.e. don't want to mind read :) )

Steelman for my current understanding is limited to "you can check if it writes files/accesses network, and if it doesn't, then by definition the chats are private and its secure", which sounds facile. (presumably something is being written to somewhere for the whole chat thing to work, can't do P2P because someone's app might not be open when you send)

TZubiri · 2026-01-31T22:50:42 1769899842

https://www.gnu.org/philosophy/free-sw.html

Whether the original comment knows it or not, Stallman greatly influenced the very definition of Source Code, and the claim being made here is very close to Stallman's freedom to study.

>"You don't actually need code to evaluate security, you can analyze a binary"

Correct

>"just as well"

No, of course analyzing source code is easier and analyzing binaries is harder. But it's still possible (feasible is the word used by the original comment)

>Steelman for my current understanding is limited to "you can check if it writes files/accesses network, and if it doesn't, then by definition the chats are private and its secure",

I didn't say anything about that? I mean those are valid tactics as part of a wider toolset, but I specifically said binaries, because it maps one to one with the source code. If you can find something in the source code, you can find it in the binary and viceversa. Analyzing file accesses and networks, or runtime analysis of any kind, is going to mostly be orthogonal to source code/binary static analysis, the only difference being whether you have a debug map to source code or to the machine code.

This is a very central conflict of Free Software, what I want to make clear is that Free Software refuses to study closed source software, not because it is impossible, but because it is unjustly hard. Free Software never claims it is impossible to study closed source software, it claims that source code access is a right, and they prefer rejecting to use closed source software, and thus never need to perform binary analysis.

LoganDark · 2026-02-02T02:48:51 1770000531

Binaries absolutely don't map one-to-one with source code. Compilers optimize out dead code, elide entire subroutines to single instructions, perform loop unrolling and auto-vectorization, and many many more optimizations and transformations that break exact mapping.

TZubiri · 2026-02-02T04:57:03 1770008223

That is true, but I don't think I ever said that binaries map one-to-one with source code.

I was referring to source code to binary maps, these are files that map binary locations to source code locations. In C (gcc/gdb) these are debug objects, they are also used in gdb style debuggers like Python's pdb, Java's jdb. They also exist in js/ts when using minifiers or react, so that you are able to debug in production.

singpolyma3 · 2026-01-31T21:45:59 1769895959

I was with you until you somehow claimed obfuscation can improve security, against all historical evidence even pre-computers.

Arch-TK · 2026-01-31T21:54:20 1769896460

Obscurity is a delay tactic which raises the time cost associated with an attack. It is true that obscurity is not a security feature, but it is also true that increasing the time cost associated with attacking you is a form of deterrant from attempts. If you are not at the same time also secure in the conventional sense then it is only buying you time until someone puts in the effort to figure out what you are doing and own you. And you better have a plan for when that time comes. But everyone needs time, because bugs happen, and you need that time to fix them before they are exploited.

The difference between obscurity and a secret (password, key, etc) is the difference between less then a year to figure it out and a year or more to figure it out.

There is a surprising amount of software out there with obscurity preventing some kind of "abuse" and in my experience these features are not that strong, but it takes someone like me hours to reverse engineer these things, and in many cases I am the first person to do that after years of nobody else bothering.

mike_d · 2026-01-31T22:00:23 1769896823

This is a tired trope. Depending exclusively on obfuscation (security by obscurity) is not safe. Maintaining confidentiality of things that could aid in attacks is absolutely a defensive layer and improves your overall security stance.

I love the Rob Joyce quote that explained why TAO was so successful: "In many cases we know networks better than the people who designed and run them."

TZubiri · 2026-02-01T00:13:29 1769904809

I think you are conflating:

Is an unbreakable security mechanism

with

Improves security

anything that complicates an attacker improves security, at least grossly. That said, then there might be counter effects that make it a net loss or net neutral.

Ajedi32 · 2026-02-02T15:39:18 1770046758

I think "manufacturer is the attacker" is precisely the threat people are most worried about.

And yes you can analyze binary blobs for backdoors and other security vulnerabilities, but it's a lot easier with the source code.

parhamn · 2026-01-31T21:31:53 1769895113

Expalin how you detect a branched/flaged sendKey (or whatever it would be called) call in the compiled WhatsApp iOS app?

It could be interleaved in any of the many analytics tools in there too.

You have to trust the client in E2E encryption. There's literally no way around that. You need to trust the client's OS (and in some cases, other processes) too.

JasonADrury · 2026-01-31T21:35:33 1769895333

>Expalin how you detect a branched/flaged sendKey (or whatever it would be called) call in the compiled WhatsApp iOS app?

Vastly easier than spotting a clever bugdoor in the source code of said app.

refulgentis · 2026-01-31T21:39:14 1769895554

Putting it all on the table: do you agree with the claim that binary analysis is just as good as source code analysis?

JasonADrury · 2026-01-31T21:40:53 1769895653

Binary analysis is vastly better than source code analysis, reliably detecting bugdoors via source code analysis tends to require an unrealistically deep knowledge of compiler behavior.

anonymars · 2026-01-31T22:37:37 1769899057

Empirically it doesn't look like there's a meaningful difference, does it?

Not having the source code hasn't stopped people from finding exploits in Windows (or even hardware attacks like Spectre or Meltdown). Having source code didn't protect against Heartbleed or log4j

I'd conclude it comes down to security culture (look how things changed after the Trustworthy Computing initiative, or OpenSSL vs LibreSSL) and "how many people are looking" -- in that sense, maybe "many eyes [do] make bugs shallow" but it doesn't seem like "source code availability" is the deciding factor. Rather, "what are the incentives" -- both on the internal development side and the external attacker side

tptacek · 2026-01-31T21:57:33 1769896653

I don't agree with "vastly better" but its arguable both in the direction and magnitude. I don't think you could plausibly argue that binary analysis is "vastly harder".

TZubiri · 2026-01-31T21:44:42 1769895882

Nono, analyzing binaries is harder.

But it's still possible. And analyzing source code is still hard.

oofbey · 2026-01-31T21:35:02 1769895302

What’s the state of the art of reverse engineering source code from binaries in the age of agentic coding? Seems like something agents should be pretty good at, but haven’t read anything about it.

TZubiri · 2026-01-31T21:46:32 1769895992

Nothing yet, agents analyze code which is textual.

The way they analyze binaries now is by using textual interfaces of command tools, and the tools used are mostly the ones supported by Foundation Models at training time, mostly you can't teach it new tools at inference, they must be supported at training. So most providers are focused on the same tools and benchmarking against them, and binary analysis is not in the zeitgeist right now, it's about production more than understanding.

oofbey · 2026-01-31T22:11:56 1769897516

The entire MCP ecosystem disagrees with your assertion that “you can’t teach it new tools at inference.” Sorry you’re just wrong.

TZubiri · 2026-02-01T02:39:26 1769913566

Nono, you of course CAN teach tool use at inference, but it's different than doing so at training time, and the models are trained to call specific tools right now.

Also MCP is not an Agent protocol, it's used in a different category. MCP is used when the user has a chatbot, sends a message, gets a response. Here we are talking about the category of products we would describe as Code Agents, including Claude Code, ChatGPT Codex, and the specific models that are trained for use in such contexts.

The idea is that of course you can tell it about certain tools in inference, but in code production tasks the LLM is trained to use string based tools such as grep, and not language specific tools like Go To Definition.

My source on this is Dax who is developing an Open Source clone of Claude Code called OpenCode

oofbey · 2026-02-01T06:03:44 1769925824

Claude code and cursor agent and all the coding agents can and do run MCP just fine. MCP is effectively just a prompt that says “if you want to convert a binary to hex call the ‘hexdump’ tool passing in the filename” and then a promise to treat specially formatted responses differently. Any modern LLM that can reason and solve math problems will understand and use the tools you give it. Heck I’ve even seen LLMs that were never trained to reason make tool calls.

You say they’re better with the tools they’re trained on. Maybe? But if so not much. And maybe not. Because custom tools are passed as part of the prompt and prompts go a long way to override training.

LLMs reason in text. (Except for the ones that reason in latent space.) But they can work with data in any file format as long as they’re given tools to do so.

TZubiri · 2026-02-01T19:18:26 1769973506

Here's the specific source on this matter:

https://youtu.be/VsTbgYawoVc?si=6ZE83umppNCz9h-a&t=1021

"The models today, they are tuned to call specific tools. We've played with a lot of tools, you can hand it a bunch of tools it's never seen before and it just doesn't call them. There's something to the post-training process being catered to certain sets of tools. So anthropic, cloud 4, cloud3.7 before that, those models are the best at calling tools from a programming standpoint. They'll actually keep trying and going for it. Other models like Gemini2.5 can be really good, but it doesn't really call tools very eagerly. So we are in the state right now where we kind of have to provide the set of tools that the model expects.

I don't think that'll always be the case, but we've given it a bunch of LSP tools. I've played with giving, say, giving it 'go to definition' and 'find references', and it just doesn't use them. I mean you can get it to use them if you ask it to, but it doesn't default to kind of thinking that way, I think that'll change."

He then goes on to theorize it's the System Prompt, so open models like Llama where you can customize the system prompt might have an advantage. (I think API models still have a prebaked prompt, not sure). Additionally, even when you control the prompt, he argues there's a (soft) limit to the amount of tools it can handle.

Personally, I think a common error with LLMs is conflating what is technically possible and what works in practice. In this case the argument is that custom tools and MCPs are possible, but it's limited in the sense that "you often need to explicitly tell them to use such tool, you can only have a small amount of custom tools", when you compare it to system prompt specified tools, and tools in the training set which are fine tuned to, it's a whole different category, the native tools are just capable of way much more autonomy.

A similar error I've seen is conflating the context length to the capacity to remember. That a model has a 1M token window means that it could remember something, but it would be a categorical mistake to claim or depend on the model remembering stuff in a 1M token conversation.

There's a lot of nuance in these discussions.

oofbey · 2026-02-01T21:34:36 1769981676

Another common mistake today is to observe one LLM failing to do something in a single situation, and to generalize that observation to "LLM's are incapable of doing this thing." Or "they're not good at this kind of thing" which is what you're repeating here. This logic underlies a lot of AI skepticism. Sure you and they aren't skeptics and acknowledge this will get better. But I think you're over-indexing on a specific problem they observed. Plus to blame the LLM when they haven't optimized the system prompt is IMHO quite silly - it's kind of like "did you read the instructions you were giving it?". What I think they should say is "I tried this and it didn't work super well out of the box. I'm sure there's some way to fix it, but I haven't found it yet." Instead of blaming the model intrinsically.

In contrast, I've seen coding agents figure out extremely complex systems problems that are clearly outside of their training set - by using tools and interacting with complex environments, and reasoning and figuring it out.

Plus, "tools" can be multi-layered. You give an agent a "bash" tool, and voila, it has access to every piece of software ever written. So I don't think any of these arguments apply in the slightest to the question of de-compiling code.

TZubiri · 2026-02-02T02:27:10 1769999230

>"What’s the state of the art of reverse engineering source code from binaries in the age of agentic coding?"

This is the original comment I was responding to, we were talking about state of the art Agentic models. So not generalizing to other scenarios.

>Sure you and they aren't skeptics and acknowledge this will get better

I think this is a common bipartisan trap where you lose a lot of nuance. And it's imprecise, you don't know whether I'm a skeptic or not. It's like reading a nuanced opinion and trying to see if they are Republican so you can agree or Democrats so you can disagree.

>Plus to blame the LLM when they haven't optimized the system prompt is IMHO quite silly - it's kind of like "did you read the instructions you were giving it?"

I think the context here is that when using agentic tools like Claude Code, you don't control the system prompt. You could write your own prompts and use naked API calls, but that's always more expensive, (because it's subsidized), and I'm not sure what the quality of that is.

The bottom line is that API calls, where you can fully control the system prompt, are more expensive. And using your OpenAI/Anthropic subscription has a fixed cost. So in that context they don't control the system prompt.

Even in cases where you could control the system prompt and use the API, there's the fact that some models (the state of the art) are fine tuned for specific tool use, so they have a bias towards fine tuned tool use. The claim is not that they are "incapable of doing X thing" it's that it's a bias towards the usage that was known at train-time or fine-tune time, instead of at inference which is much weaker. Nuance.

>Instead of blaming the model intrinsically.

Again, not blaming or being a skeptic here, just analyzing the state of the art and its current weakness, it's likely that these things are going to be improved in the next generation, this is going to move fast, if you conflate any criticism of the tools with "skepticism" you are going to miss the nuance.

>In contrast, I've seen coding agents figure out extremely complex systems problems that are clearly outside of their training set - by using tools and interacting with complex environments, and reasoning and figuring it out.

Yeah for sure, I'll give you a concrete example on this point where we agree. I made a model download a webdriver for a browser and taught it to use the webdriver to open the site, take screenshots, and evaluate how it looks visually, in addition to actually clicking buttons and navigating it. This is a great improvement when the traditional approach is just to generate frontend code and trust that it works (which to be fair, sometimes works great, but you know, it's better if it see that.)

And it works, until it doesn't and I have to remind it that it can do that. It's just a bias. If they would have trained the model with WebDriver tool access, the model would use it much more (and perhaps they are already doing that and we will see it in the next model.)

The main thesis is that instructions taught at train time 'work better' than at fine-tune time which in turn are stronger than 'inference'. To be very specific during inference tool use is much more likely immediately after mentioning it, it might be stronger more consistently at the system prompt, (but it competes with other system prompt instructions and it's still inference based). To say nothing of the costs associated with adding to inference tokens, compared to essentially free training/finetune biases. I don't think anyone disagrees that stuff you teach the model during training has better quality and less cost than stuff you teach at inference.

I think playing around with logit biases is an underrated tool to increase and control frequency of certain tools, but it doesn't seem that's being used much in this generation of vibecode tools, the interface is almost entirely textual (with some /commands starting to surface). Maybe the next generation will have the option to configure some specific parameters instead of entirely relying on textual prompting.

refulgentis · 2026-01-31T21:37:36 1769895456

Agents are sort of irrelevant to this discussion, no?

Like, it's assuredly harder for an agent than having access to the code, if only because there's a theoratical opportunity to misunderstand the decompile.

Alternatively, it's assuredly easier for an agent because given execution time approaches infinity, they can try all possible interpretations.

oofbey · 2026-01-31T22:13:58 1769897638

Agents meaning an AI iteratively trying different things to try to decompile the code. Presumably in some kind of guess and check loop. I don’t expect a typical LLM to be good at this on its first attempt. But I bet Cursor could make a good stab at it with the right prompt.

TZubiri · 2026-02-01T19:20:09 1769973609

Cursor is a bit old at this point, the state of the art is Claude Code and imitators (ChatGPT Codex, OpenCube).

Devin is also going very strong, but it's a bit quieter and growing in enterprises (and pretty sure it uses Claude Opus 4.5 and possibly Claude Code itself). In fact Clawdbot/Moltbot/OpenClaw was itself created with devin.

The big difference is the autonomy these models have (Devin more than Claude Codes), Cursor was meant to work in an IDE and that was a huge strength during the 12 months that the models still weren't strong enough to work autonomously, but they are getting to the point where that's becoming a weakness. Models like Devin are getting a slower acceleration but higher top speed advantage. My chips are on Devin

roughly · 2026-01-31T21:48:47 1769896127

I think there’s a good possibility that the technology that is LLMs could be usefully trained to decode binaries as a sort of squint-and-you-can-see-it translation problem, but I can’t imagine, eg, pre-trained GPT being particularly good at it.

JasonADrury · 2026-01-31T21:39:45 1769895585

I've been working on this, the results are pretty great when using the fancier models. I have successfully had gpt5.2 complete fairly complex matching decompilation projects, but also projects with more flexible requirements.

oofbey · 2026-02-01T06:05:53 1769925953

What have you managed to decompile? Did you do it with a coding agent?

JasonADrury · 2026-02-02T08:39:14 1770021554

I've been decompiling and patching a wide range of software using codex with ida-pro-mcp and radare2 for generic targets, various language-specific tools for .net and java for example. IDA is heavily scriptable so the LLM usually ends up interacting rather interesting ways, but generally extremely effectively.

I'm not paying for the tokens I use, so I just choose whatever is the most performant model OpenAI offers. I find LLMs to be highly capable, struggling very little even against fairly obnoxious obfuscation.

My use cases have ranged from malware analysis to adding new features to complicated EOL enterprise software without access to the source code.

I've done a lot of manual reverse engineering. In many cases you can genuinely 100x your productivity using these tools. Tasks like matching decompilation are an especially good fit for LLMs.