Hacker Newsnew | past | comments | ask | show | jobs | submit | ac29's commentslogin

Your benchmark has Opus 4.7 performing significantly worse than Sonnet 4.6. Even if true on your benchmark, that is not representative of the overall performance of the models.

Yes Opus 4.7 fast (no reasoning) did a worst job than Sonnet 4.6 high (with reasoning) according to Gemini 3.1 Pro evaluation.

Your table doesn't indicate reasoning vs non-reasoning, or reasoning level

When nothing is noted it's max reasoning (xhigh in copilot chat in vscode if available).

The models not availble on copilot were tested through opencode (max reasoning) and deepseek v4 was tested through Cline (with max reasoning too).


There are Yoga rooms in terminals 1, 2, and 3

I'm using it via OpenCode Go, which claims to only use Zero Data Retention providers.

How much you trust any particular provider's claim to not retain data is subjective though.


The memory requirements aren't that intense. You can run useful (not frontier) models on a $2-5K machine at reasonable speeds. The capabilities of Qwen3.6 27B or 35B-A3B are dramatically better than what was available even a few months ago.

Practical? Maybe not (unless you highly value privacy) because you can get better models and better performance with cheap API access or even cheaper subscriptions. As you said, this may indefinitely be the case.


> The capabilities of Qwen3.6 27B or 35B-A3B are dramatically better than what was available even a few months ago.

Yes, a lot better, but still terribly unreliable and far less capable than the big unquantized models.


It was a weird point to make in the post given that exe.dev charges $0.07/GB for transfer. That's arguably worse than the major clouds, who charge about the same for egress but give you free ingress.

Author here.

I need to fix our transfer pricing. (In fact I'm going to go look at it now.) I set that number when we launched in December, and we were still considering building on top of AWS, so we put a conservative limit based on what wouldn't break the bank on AWS. Now that we are doing our own thing, we can be far more reasonable.


Love the attitude!! Well done, and good luck with all this. I sent you an email offering help, if any is needed/welcome.

Google's naming might be misleading, currently 3.1 flash image outperforms the available pro version (3.0 pro) on most benchmarks: https://deepmind.google/models/model-cards/gemini-3-1-flash-...

From the upstream project:

> Can I change the display of all ESLs in a store at once ?

No. For two reasons:

Unlike radio waves, optical communication must be line-of-sight. Even from wall and ceiling reflections, an unique transmitter has no chance of reaching all of the hundreds or thousands of ESLs in a store.

Each ESL has an unique address which must be specified in update commands. There's no known way to broadcast display updates.


GLM 5.1 is pretty good, probably the best non-US agentic coding model currently available. But both GLM 5.0 and 5.1 have had issues with availability and performance that makes them frustrating to use. Recently GLM 5.1 was also outputting garbage thinking traces for me, but that appears to be fixed now.

Use them via DeepInfra instead of z.ai. No reliability issues.

https://deepinfra.com/zai-org/GLM-5.1

Looks like fp4 quantization now though? Last week was showing fp8. Hm..


Deepinfra's implementation of it is not correct. Thinking is not preserved, and they're not responding to my submitted issue about it.

I also regularly experience Deepinfra slow to an absolute crawl - I've actually gotten more consistent performance from Z.ai.

I really liked Deepinfra but something doesn't seem right over there at the moment.


Damn. Yeah, that sucks. I did play with it earlier again and it did seem to slow down.

It's frankly a bummer that there's not seemingly a better serving option for GLM 5.1 than z.AI, who seems to have reliability and cost issues.


> So say someone built an under $10k system, with perhaps dual RTX 5090. That same system will be able to easily run 20 parallel requests. The only cost is electricity. You can run it 24/7. For 1 year, that's ~$6million

I dont see how you get anywhere close to $6M of tokens out of a pair of 5090s. The class of model they could run is fairly small and extremely cheap to run via API (my math says running Gemma4-31B for 24 hours costs less than $1 on OpenRouter). Even with 20x concurrent requests you are orders of magnitude away from $6M/yr.


I never said that, my point is that paying 20 people at $35 for 24/7 is about $6 million. You can replace that with a $10k system running 20 parallel requests for a year and save lots of money.

I agree, but do the potential customers of my business?

We need to meet the customer where they are and that means making our site more accessible to search engines, mobile devices, LLMs, or whatever comes next.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: