On first principles it would seem that the "harness" is a myth. Surely a model like Opus 4.6/Codex 5.3 which can reason about complex functions and data flows across many files would trip up over top level function signatures it needs to call?
I see a lot of evidence to the contrary though. Anyone know what the underlying issue here is?
I did read the article quite enthusiastically and my practical experience confirms the same. Sure the difference is more subtle. But my point was, an "agent" whether human or AI can be a lot more productive with better tools. This guy found a better screwdriver than the most commonly used one. That's amazing and nothing from "first principles" denies that a better tool harness would mean better/faster/more correct AI agents.
If you agree that current LLMs (Transformers) are naturally very susceptible to context/prompt, then you can go on to ask agents for a "raw harness dump" "because I need to understand how to better present my skills and tools in the harness", you maybe will see how "Harness" impact model behavior.
The models generalized "understanding" and "reasoning" is the real myth that makes us take a step back and offload the process deterministic computing and harnesses.
I wanted a terminal feel (dense/sharp) + being able to comment directly on plans and outputs. It's MIT, no cloud, all local, etc.
It includes all the details for function runs and some other nice to haves, fully built on claude code.
Particularly we found planning + commenting up front reduces a lot of slop. Opus 4.6 class models are really good at executing an existing plan down to a T. So quality becomes a function of how much you invest in the plan.
I guess if I had to do it, I'd reject pushes without the requisite commit to entire/checkpoints/v1. I think their storage strategy is a bit clunky for that, but it can be done. I might look to do something more like the way jujutsu colocates its metadata. I don't think this particular implementation detail makes too much of a difference, though. I got along just fine in a regulated environment by setting a policy and enforcing it before git existed. Ideally, we'd do things for a good reason, and if you can't get along in that world, then it's probably not the right job for you. Sometimes you've got to get the change controls in order before we can mainline your contributions because we have to sustain audits. I don't think this is about forcing people to do something that they otherwise wouldn't do if you told them that it's a requirement of the job.
100%. Day one is to ship the basic capability, which many of us have already vibe-coded... Day two is all the enterprise stuff to make big companies trust AI coding more. That could unlock a lot of revenue. This isn't random at all.
Expalin how you detect a branched/flaged sendKey (or whatever it would be called) call in the compiled WhatsApp iOS app?
It could be interleaved in any of the many analytics tools in there too.
You have to trust the client in E2E encryption. There's literally no way around that. You need to trust the client's OS (and in some cases, other processes) too.
Binary analysis is vastly better than source code analysis, reliably detecting bugdoors via source code analysis tends to require an unrealistically deep knowledge of compiler behavior.
Empirically it doesn't look like there's a meaningful difference, does it?
Not having the source code hasn't stopped people from finding exploits in Windows (or even hardware attacks like Spectre or Meltdown). Having source code didn't protect against Heartbleed or log4j
I'd conclude it comes down to security culture (look how things changed after the Trustworthy Computing initiative, or OpenSSL vs LibreSSL) and "how many people are looking" -- in that sense, maybe "many eyes [do] make bugs shallow" but it doesn't seem like "source code availability" is the deciding factor. Rather, "what are the incentives" -- both on the internal development side and the external attacker side
I don't agree with "vastly better" but its arguable both in the direction and magnitude. I don't think you could plausibly argue that binary analysis is "vastly harder".
Where do you draw the line tho? How many kilobytes and how much future maintenance work is avoiding a potential slight visual inconsistency with a radio button worth? Is it worth to lose the x amount of people who have bad network connection?
Use this approach everywhere and the actual content of the page (you know: the stuff people came for) suffers.
All I can think about is a quote by world famous video artist Nam June Paik: When to perfect, Gott böse ("God gets mad when too perfect", the original isn't exactly a full sentence and mixes English and German).
Based on profits of many webapps, there is no line. What eng here forget is that they are oft not the targeted consumer. The hypothetically perfect website doesnt sell as well as a colorful fat choncker does. It is like fast food, not every cares about farm to table.
> It is like fast food, not every cares about farm to table
I mean, a "colorful fat choncker" website is literally the opposite of fast food - its slower to arrive, and focuses way too much on appearances.
In this analogy, the website using these ridiculous abstractions is more like Salt Bae or whatever idiotic trend has replaced him. All glitz, zero substance, slower, and for no apparent reason.
The fast food equivalent is stuff like the Google home page: it doesn't validate, is actively harmful to you, the community, and the planet but is immensely popular.
Everyone always says slower and bloat and bad etc etc but it is all relative. Not everyone is an eng who scoffs at waiting another 100ms.
I do like your analogy tho. It is better. Most people want that trendy experience or fast food. Still, both exist because the market demands it be so despite how much it tilts a subset.
I worked in first level IT support and I think most people don't even consider it consciously like that. They read the news at that page. That page changes. A lot has to happen to piss them off enough to make them go. They habitually click away fifty windows a day without reading them anyways.
But people do notice if something just works on a subconscious level and that colors their perception of your project/brand/page or whatever. Even my totally tech-illiterate father actively complains about junk interfaces like the one at Temu. But he goes there for the sweet deals. I just wonder if it wouldn't work out better for them if the page was snappy and allowed a person to visit more product pages.
And one mistake you make is to think you need a megabyte of javascript to create a junk look. You can easily do that with HTML and CSS alone, including animations and all.
The way I see it the causal arrow points in the other way: successful sites tend to get bloaty, but they do no et successful because of it, but despite it.
And by bloaty I don't mean it as a problem if the page does a lot. Bloaty means you use a intricate Rube-Goldberg-machine to in the end do very basic things. Like displaying a popup, which can be done with a single line of Javascript, but is for some reason done using the amount of code that would result in a veritable, heavyweight book if printed.
This is objectively not true, if it were the path of least resistance would mean everyone uses the option that is fastest and best.
It takes far less effort to implement the bad way. I think people take their own skill for granted. Maybe you can but most others cannot. Maybe they will learn or maybe they are happy to put food on the table and go home at 5.
When I say "implement" I mean the big pile of code in the library. I do not believe making that entire custom mechanism was easier. There's so much to it.
Everyone else following along and merely using it I blame less, but they shouldn't have picked such a bloated library.
Under all of the framework complexity that specific look is still achieved with CSS. In fact, you could rip out the CSS they use with very little modification and pair it with a ~five-line React component that doesn't require any third-party imports.
Everything in styles.css in that example maps to the vanilla input, so you just have to move them around a bit. Should work at least as well as theirs across browsers, because it's vanilla inputs and the same CSS.
> - Make sure it looks the exact same across all browsers
> How doable is it with vanilla css?
It's not doable with your fancy frontend framework and your 20 imports and your ten thousand lines of typescript.
"Make sure it looks the exact same across all browsers" is, and always has been, fundamentally at odds with how the web is intended to work.
How well does this shadcn crap render in arachne? ladybird? netsurf? links? dillo? netscape 3? The latest version of chrome with user styles applied?
When you say "exactly the same", I assume you mean that the design only uses black and white, because some people might have black and white monitors, right? But you're also going to use amber-on-black because some people might have amber screen monitors, right? How do you plan on ensuring it looks exactly the same on a braille terminal?
Maybe you think I'm being silly. Because nobody uses monochrome monitors in 2026, right? So it's safe to ignore that and put an asterisk next to "exactly the same" (And also just forget that e-ink is a thing that exists).
(Just like how it was safe in 2006 to assume people would always have 800x600 or bigger displays, and nobody would ever come along using a screen with, say, 480×320 resolution)
What measures have you taken to ensure that your colours appear exactly the same across a bunch of different types/brands of monitors that render colours differently? Or, perhaps we should just add another asterisk next to "exactly the same"?
I could go on.
How many asterisks is acceptable before "exactly the same" isn't a thing anymore?
If "exactly the same on all browsers" is one of your goals, you are wrong. If your designer tells you that's what they want, they are wrong. If you ever tell a client that's what you're providing, you are wrong.
I think accessibility is one area where some of these components libraries can be helpful as they automatically include a11y features that might otherwise be ignored.
Displaying the same thing on every monitor to the degree that monitor allows is well-defined. The browser may not be able to show some colors and the browser may decide to display things differently on purpose, but it's perfectly reasonable to want to unambiguously express what you _want_ the browser to display.
> Displaying the same thing on every monitor to the degree that monitor allows is well-defined.
In this case the website will not appear the same on every browser. Most browsers have a zoom function that the user controls which is an accessability feature. This changes how the website renders on the page.
Off topic note: I read the website and a few pages of the docs and it's unclear to me for what I can use LightPanda safely. Like say I wanted to swap my it as my engine on playwright, what are the tradeoffs? What things are implemented, what isnt?
Thanks for the feedback, we will try to make this clearer on the website. Lightpanda works with Playwright, and we have some docs[1] and examples[2] available.
Web APIs and CDP specifications are huge, so this is still a work in progess. Many websites and scripts already work, while others do not, it really depends on the case. For example, on the CDP side, we are currently working on adding an Accessibility tree implentation.
Maybe you should recommend a recipe for configuring playwright with both chromium and lightpanda backends so a given project can compare and evaluate whether lightpanda could work given their existing test cases.
I think it's really more of an alternative to JSDom than it is an alternative to Chromium. It's not going to fool any websites that care about bots into thinking it's a real browser in other words.
Given certificate issuance basically ended up being "do you control the DNS for this domain", I feel like all of it could've been so much simpler if it was designed like that from day one.
While I love Let's Encrypt it feels so silly to use a third party to verify I can generate a Cloudflare API key (even .well-known is effectively "can you run a webserver on said dns entry").
I'm surprised that's not illegal, and I think states will pass laws to fix this.
In my area, an &Pizza is $12 on their App, $19 on Doordash (delivery or pickup). A Chipotle burrito is $9.50 vs $12.35 on doordash delivery (plus every addon is a $1 more expensive).
You can easily pay an extra $4/$5 (30%) per item you order on there.
Why would it be illegal? If you think of it as Doordash buying the pizza and then reselling it to you, there'd be no reason not to expect a mark up. You're allowed to price discriminate between different market segments, so even if we pitch Doordash as merely a third party delivery offering, restaurants could still charge Doordash customers more than those that come into their storefront.
It should be illegal because these services market a subscription to you claiming the benefit of zero fees and free delivery, which is a lie. You are being secretly charged through a higher menu price, none of which is shown to you as a customer.
I can't count how many friends I have had to explain this to who don't understand they are paying 20-30% more even after getting "free delivery" than if they just ordered directly through the restaurant.
Also, Doordash does not have "zero fees" for orders when you pay for DashPass, they have "reduced" fees. I do absolutely hate the practice of "Taxes & Fees" being a line item and only when you click into it do you see that the taxes are minimal and most of that is the platform fees.
I'm not sure how UberEats/etc handle it but it's absolutely crazy how much of a markup there is to order through Doordash vs going to pick it up when you factor in Restaurant Upcharge + Doordash Fees + Tip. It's easy to have an $8 item suddenly cost $20 or more total out-of-pocket when all is said and done.
I don't know that it should be illegal. I think the argument would be that it's deceptive. DoorDash, for some customers, claims there is no service fee - it's "free." What they're really saying is: the service of our delivery may be free, but the overall service we will provide is hidden or obfuscated in menu items, and without doing some research at the restaurant, you'll likely not notice.
One could argue it's best for the consumer to very clearly understand how much more they're paying. If not a service fee, here is our aggregate food markup, in plain sight. Transparency, in other words. Let's not borrow any ideas from the healthcare system.
> If you think of it as Doordash buying the pizza and then reselling it to you
Isn't this basically impossible to do legally in the U.S.? Wouldn't you run into trouble both with IP law and food safety laws around reselling prepared foods?
IP law, no, for the same reason nothing stops me from reselling the Ralph Lauren shirt I bought as a Ralph Lauren shirt, so long as I make no pretenses of being Ralph Lauren, and I make no modifications to it. The good is the same, IP protected good. I'm just re-selling it.
Food safety? There might be some restrictions related to food handling, but to my understanding they're mostly pretty rote food handling safety training stuff that I'd hope delivery companies provide anyway.
It's the opposite — you're legally protected to resell anything you buy and the seller can't stop you. I'm not sure if food has any caveats, but in general, IP law cannot stop you from reselling an item.
This has been happening for a good while. There are loads of instances of food delivery companies creating unauthorized websites for restaurants with a phone number and url owned by the delivery company. They are literally buying the food and reselling it to you at a markup.
What I personally dislike about this is that it hides the cost of Doordash. It's not intuitive that the prices of items is silently higher on Doordash: it's not like online retailers having different prices for the same SKU, it's the same restaurant. I'd prefer the overhead to show up as its own line item, rather than obscuring the actual cost of the service. I have a feeling less people would choose to Doordash as often if they realized just how much more things cost through it. (Not everyone, but, there are a lot people who really do just do it for convenience, and they could just drive and go pick up their own takeout.)
You have a point, but I just think it's less intuitive for consumers. Manufacturers often don't even do direct sales, so the only "canonical" price is the MSRP, which is just that, a suggestion. Consumers go shopping at Walmart or Amazon, they don't go "shopping" at Doordash: the menu they're seeing on Doordash is the restaurant menu. In some cases, it is the only online menu that some restaurants even have. To me it is not terribly intuitive that these prices differ.
There is another analog for this, too, though: some retailers indeed would have more or less expensive prices for the same thing when ordering online versus in-store. I think the argument that it isn't unprecedented is pretty solid.
Despite not being entirely unprecedented, I'd still prefer to see this practice ended for food delivery services so it is easier to see the actual true overhead of food delivery services. It really does feel a bit manipulative the way it is right now.
Without regulation, "the market" wouldn't care about a lot of things. It's actually a good thing that a small minority of people hold the line for people who don't have time to care about issues like this kind of manipulation!
I’m pretty sure DoorDash is the one who increases the price on their end, not the business. And what’s more, they don’t separate the addition out. It’s rolled in to the cost of the item.
I’d be very curious what the conversation is between them. I highly doubt DoorDash negotiates with every restaurant on their platform and wouldn’t be surprised to discover they just tack it on independently. I could see that raising some interesting questions.
All of this is predicated on “ifs” and assumptions, so feel free to throw it out. Just kind of musing here lol
> I’m pretty sure DoorDash is the one who increases the price on their end, not the business
That is not correct. Doordash takes a 20-30% commission on each sale, so businesses preemptively increase the prices to offset that. They're not forced to and doordash isn't doing it for them. But, you know, they're still effectively "forced" to if their in-store prices don't have great margins to begin with...
Most of that money goes towards the driver, last I checked in on unit economics. It costs quite a bit of money to pay a person to go to the restaurant, wait around, and then bring it to you — far more than the "delivery fee" that you see and that customers would pay.
Customers are cheap and they're (partly) to blame. My theory is that Amazon conditioned people to view delivery as a free commodity and pizza places who had delivery baked into their model cemented it.
So if Doordash listed a delivery fee that covered their true cost of delivery, customers would balk. So they instead have to find creative ways to get enough. Maybe it's changed and Doordash cracked the secret, but when I'd looked into it years ago these companies barely got by — many of them actually losing money.
With pizza delivery you typically (should) tip the driver $5+ ($10+ for larger orders) so idk if that really tracks specifically, but I do largely agree that part of is people being cheap for one reason or another.
I know people who drove for DD and they roughly earn minimum wage ~$15/hr. You can easily deliver 2 orders in an hour. So I don't really buy that either.
I’m not sure I’m understanding your comment exactly so if this response is off let me know: I’m talking about traditional calling pizza in, not app delivery. At least when I was growing up that’s what we typically tipped.
I think most reasonable would be to mandate that these food delivery services can not take cut from payment going to restaurants, but instead must charge it fully to customer. So restaurant has menu price of 10 and when someone orders delivery from payment 10 goes to restaurant. And delivery service is free to add their margin on top be it 30 or 50%.
Doordash and other companies like it take a good chunk of the margins of those stores for the privilege of delivering the items. 15% is not unheard of.
I see a lot of evidence to the contrary though. Anyone know what the underlying issue here is?