Your benchmark has Opus 4.7 performing significantly worse than Sonnet 4.6. Even if true on your benchmark, that is not representative of the overall performance of the models.
The memory requirements aren't that intense. You can run useful (not frontier) models on a $2-5K machine at reasonable speeds. The capabilities of Qwen3.6 27B or 35B-A3B are dramatically better than what was available even a few months ago.
Practical? Maybe not (unless you highly value privacy) because you can get better models and better performance with cheap API access or even cheaper subscriptions. As you said, this may indefinitely be the case.
It was a weird point to make in the post given that exe.dev charges $0.07/GB for transfer. That's arguably worse than the major clouds, who charge about the same for egress but give you free ingress.
I need to fix our transfer pricing. (In fact I'm going to go look at it now.) I set that number when we launched in December, and we were still considering building on top of AWS, so we put a conservative limit based on what wouldn't break the bank on AWS. Now that we are doing our own thing, we can be far more reasonable.
> Can I change the display of all ESLs in a store at once ?
No. For two reasons:
Unlike radio waves, optical communication must be line-of-sight. Even from wall and ceiling reflections, an unique transmitter has no chance of reaching all of the hundreds or thousands of ESLs in a store.
Each ESL has an unique address which must be specified in update commands. There's no known way to broadcast display updates.
GLM 5.1 is pretty good, probably the best non-US agentic coding model currently available. But both GLM 5.0 and 5.1 have had issues with availability and performance that makes them frustrating to use. Recently GLM 5.1 was also outputting garbage thinking traces for me, but that appears to be fixed now.
> So say someone built an under $10k system, with perhaps dual RTX 5090. That same system will be able to easily run 20 parallel requests. The only cost is electricity. You can run it 24/7. For 1 year, that's ~$6million
I dont see how you get anywhere close to $6M of tokens out of a pair of 5090s. The class of model they could run is fairly small and extremely cheap to run via API (my math says running Gemma4-31B for 24 hours costs less than $1 on OpenRouter). Even with 20x concurrent requests you are orders of magnitude away from $6M/yr.
I never said that, my point is that paying 20 people at $35 for 24/7 is about $6 million. You can replace that with a $10k system running 20 parallel requests for a year and save lots of money.
I agree, but do the potential customers of my business?
We need to meet the customer where they are and that means making our site more accessible to search engines, mobile devices, LLMs, or whatever comes next.
reply