More

Oras · 2026-04-22T16:42:06 1776876126

If these models reach quality of Opus 4.5, then DGX could be a good alternative for serious dev teams to run local models. It is not that expensive and has short time to make ROI

czk · 2026-04-22T19:30:59 1776886259

Memory bandwidth is the biggest L on the dgx spark, it’s half my MacBook from 2023 and that’s the biggest tok/sec bottleneck

Oras · 2026-04-22T15:09:06 1776870546

I agree. The problem is the noise ratio, not how the platform was implemented.

Oras · 2026-04-21T22:40:55 1776811255

My test for image models is asking it to create an image showing chess openings. Both this model and Banana pro are so bad at it.

While the image looks nice, the actual details are always wrong, such as showing pawns in wrong locations, missing pawns, .. etc.

Try it yourself with this prompt: Create a poster to show opening game for Queen's Gambit to teach kids to play chess.

lxgr · 2026-04-21T22:55:23 1776812123

It almost nailed it for me (two squares have both white and black color). All pieces and the position look correct.

tempaccount5050 · 2026-04-21T22:55:17 1776812117

What move? Who's turn is it? Declined or accepted? Garbage in, garbage out.

bogtap82 · 2026-04-21T23:01:41 1776812501

In some cases I would agree with this, but image model releases including this one are beginning to incorporate and market the thinking step. It is not a reach at this point to expect the model to take liberties in order to deliver a faithful and accurate representation of your request. A model could still be accurate while navigating your lack of specificity.

timacles · 2026-04-22T00:24:37 1776817477

Kasparov vs Karpov ‘87 Olympiad. Move 6

dudul · 2026-04-21T23:26:46 1776814006

What do you mean? Parent clearly describes the Queen's Gambit. 1.d4 d5 2.c4 There is no room for ambiguity here.

kuboble · 2026-04-22T02:46:20 1776825980

King Indian Defense would be a better prompt as Queen's Gambit can now refer to e.g. some scene from Netflix series.

Oras · 2026-04-21T22:19:55 1776809995

Incredible, powerful, but I couldn't believe how fast I hit the limits compared to how it was with Opus 4.6. They removed Opus 4.6 completely from CC. I would prefer it with the previous limits.

That's not how you keep your customers. None of these agents have a moat, I moved away from Cursor when they started doing what Anthropic is doing now, and never went back even when I was a paying customer since the start.

conception · 2026-04-22T01:02:08 1776819728

You can just use the model parameter to bring it back

Oras · 2026-04-20T14:42:41 1776696161

I find it odd that none of OpenAI models was used in comparison, but used Z GLM 5.1. Is Z (GLM 5.1) really that good? It is crushing Opus 4.5 in these benchmarks, if that is true, I would have expected to read many articles on HN on how people flocked CC and Codex to use it.

ac29 · 2026-04-20T14:49:00 1776696540

GLM 5.1 is pretty good, probably the best non-US agentic coding model currently available. But both GLM 5.0 and 5.1 have had issues with availability and performance that makes them frustrating to use. Recently GLM 5.1 was also outputting garbage thinking traces for me, but that appears to be fixed now.

cmrdporcupine · 2026-04-20T15:05:28 1776697528

Use them via DeepInfra instead of z.ai. No reliability issues.

https://deepinfra.com/zai-org/GLM-5.1

Looks like fp4 quantization now though? Last week was showing fp8. Hm..

wolttam · 2026-04-20T15:13:51 1776698031

Deepinfra's implementation of it is not correct. Thinking is not preserved, and they're not responding to my submitted issue about it.

I also regularly experience Deepinfra slow to an absolute crawl - I've actually gotten more consistent performance from Z.ai.

I really liked Deepinfra but something doesn't seem right over there at the moment.

cmrdporcupine · 2026-04-20T15:55:25 1776700525

Damn. Yeah, that sucks. I did play with it earlier again and it did seem to slow down.

It's frankly a bummer that there's not seemingly a better serving option for GLM 5.1 than z.AI, who seems to have reliability and cost issues.

kardianos · 2026-04-20T14:48:06 1776696486

Yes. GLM 5.1 is that good. I don't think it is as good as Claude was in January or February of this year, but it is similar to how Claude runs now, perhaps better because I feel like it's performance is more consistent.

coder68 · 2026-04-20T15:16:48 1776698208

In fact it is appreciated that Qwen is comparing to a peer. I myself and several eng I know are trying GLM. It's legit. Definitely not the same as Codex or Opus, but cheaper and "good enough". I basically ask GLM to solve a program, walk away 10-15 minutes, and the problem is solved.

Oras · 2026-04-20T15:24:39 1776698679

cheaper is quite subjective, I just went to their pricing page [0] and cost saving compared to performance does not sell it well (again, personal opinion).

CC has a limited capacity for Opus, but fairly good for Sonnet. For Codex, never had issues about hitting my limits and I'm only a pro user.

https://z.ai/subscribe

pros · 2026-04-20T14:49:18 1776696558

I'm using GLM 5.1 for the last two weeks as a cheaper alternative to Sonnet, and it's great - probably somewhere between Sonnet and Opus. It's pretty slow though.

bensyverson · 2026-04-20T16:44:58 1776703498

This is what kills it for me… The long thinking blocks can make a simple task take 30 minutes.

vidarh · 2026-04-20T15:35:14 1776699314

GLM 5.1 is the first model I've found good enough to spring for a subscription for other than Claude and Codex.

It's not crushing Opus 4.5 in real-life use for me, but it's close enough to be near interchangeable with Sonnet for me for a lot of tasks, though some of the "savings" are eaten up by seemingly using more tokens for similar complexity tasks (I don't have enough data yet, but I've pushed ~500m tokens through it so far.

Alifatisk · 2026-04-20T15:18:42 1776698322

GLM-5 is good, like really good. Especially if you take pricing into consideration. I paid 7$ for 3 months. And I get more usage than CC.

They have difficulty supplying their users with capacity, but in an email they pointed out that they are aware of it. During peak hours, I experience degraded performance. But I am on their lowest tier subscription, so I understand if my demand is not prioritized during those hours.

ekuck · 2026-04-20T15:42:31 1776699751

Where are you getting 3 months for $7?

Alifatisk · 2026-04-20T17:03:16 1776704596

They had a Christmas deal that ended January 31.

culi · 2026-04-20T16:39:39 1776703179

If you only look at open models, GLM 5.1 is the best performance you can get on on the Pareto distribution

https://arena.ai/leaderboard/text?viewBy=plot&license=open-s...

c0n5pir4cy · 2026-04-20T14:50:32 1776696632

I've been using it through OpenCode Go and it does seem decent in my limited experience. I haven't done anything which I could directly compare to Opus yet though.

I did give it one task which was more complex and I was quite impressed by. I had a local setup with Tiltdev, K3S and a pnpm monorepo which was failing to run the web application dev server; GLM correctly figured out that it was a container image build cache issue after inspecting the containers etc and corrected the Tiltfile and build setup.

cleaning · 2026-04-20T15:20:19 1776698419

Most HN commenters seem to be a step behind the latest developments, and sometimes miss them entirely (Kimi K2.5 is one example). Not surprising as most people don't want to put in the effort to sift through the bullshit on Twitter to figure out the latest opinions. Many people here will still prefer the output of Opus 4.5/4.6/4.7, nowadays this mostly comes down to the aesthetic choices Anthropic has made.

Oras · 2026-04-20T15:28:28 1776698908

Not just aesthetics though, from time to time I implement the same feature with CC and Codex just to compare results, and I yet to find Codex making better decisions or even the completeness of the feature.

For more complicated stuff, like queries or data comparison, Codex seems always behind for me.

__blockcipher__ · 2026-04-20T14:48:51 1776696531

Yeah GLM’s great for coding, code review, and tool use. Not amazing at other domains.

esafak · 2026-04-20T14:47:22 1776696442

I use it and think its intelligence compares favorably with OpenAI and Anthropic workhorses. Its biggest weakness is its speed.

throwaw12 · 2026-04-20T14:48:53 1776696533

maybe they decided OpenAI has different market, hence comparing only with companies who are focusing in dev tooling: Claude, GLM

edwinjm · 2026-04-20T15:02:44 1776697364

Haven’t you heard about Codex?

throwaw12 · 2026-04-20T15:10:55 1776697855

its an SKU from OpenAI's perspective, broader goal and vision is (was) different. Look at the Claude and GLM, both were 95% committed to dev tooling: best coding models, coding harness, even their cowork is built on top of claude code

zozbot234 · 2026-04-20T15:24:47 1776698687

I'm not sure how this makes sense when Claude models aren't even coding specific: Haiku, Sonnet, Opus are the exact same models you'd use for chat or (with the recent Mythos) bleeding edge research.

throwaw12 · 2026-04-20T15:39:26 1776699566

Anthropic models and training data is optimized for coding use cases, this is the difference.

OpenAI on the other hand has different models optimized for coding, GPT-x-codex, Anthropic doesnt have this distinction

pixel_popping · 2026-04-20T16:40:31 1776703231

But they detect it under the hood and apply a similar "variant", as API results are not the same than on Claude Code (that was documented before by someone).

Oras · 2026-04-20T09:44:00 1776678240

Would be nice to see the ratio of OpenClaw stars

az226 · 2026-04-20T10:21:16 1776680476

99% stars from Claws themselves

Oras · 2026-04-17T15:44:35 1776440675

Data suggest different outcomes, there was always a way to standardise interfaces, from Twitter bootstrap, all the way to shadcn.

Not everyone is looking for unique design, 70% of the web is still using Wordpress. I would say majority prefer familiarity and appreciate uniqueness.

ValentineC · 2026-04-18T01:52:28 1776477148

> Not everyone is looking for unique design, 70% of the web is still using Wordpress. I would say majority prefer familiarity and appreciate uniqueness.

Most people using WordPress customise it with many of the thousands of plugins available though, and those plugins create menu items everywhere.

Oras · 2026-04-10T10:21:04 1775816464

This would work on people too, you can see daily fake info/text/videos and many people believing in them.

LLMs do not think, why this is still hard to understand? They just spit out whatever data they analyse and trained on.

I feel this kind of articles is aimed at people who hate AI and just want to be conformable within their own bias.

simmerup · 2026-04-10T10:24:26 1775816666

The journals the scientist submitted had a fake university, explicitly fake people, references to the simpsons and star trek, etc

Most doctors would not believe that, and would also consider any new eye disease they’d never see in real life with scepticism

kenjackson · 2026-04-10T10:36:32 1775817392

LLMs will need to develop a notion of trustworthiness. Interesting that part of the process of learning isn’t just learning, but also learning what to learn and how much value to put into data that crosses your path.

simmerup · 2026-04-10T11:49:58 1775821798

To me I think the problem is the blast radius

All of us are slightly wrong about things, but not all of us are treated as oracles of correct information like Opus, ChatGPT, etc are

Oras · 2026-04-10T12:29:12 1775824152

you're confusing LLMs with humans

simmerup · 2026-04-10T14:35:35 1775831735

Not massively sure I am

hoppyhoppy2 · 2026-04-10T11:11:51 1775819511

Journals? The article says the article was uploaded to 2 preprint servers.

simmerup · 2026-04-10T11:20:10 1775820010

Sorry, even worse then

I got confused because a journal referenced them > The experiment’s reach has now spread into the published medical literature. The bixonimania research has been cited by a handful of researchers, including a study that appeared in Cureus, a journal published by Springer Nature, the publisher of Nature, by researchers at the Maharishi Markandeshwar Institute of Medical Sciences and Research in Mullana, India (S. Banchhor et al. Cureus 16, e74625 (2024); retraction 18, r223 (2026)). (Nature’s news team is editorially independent of its publisher.)

Oras · 2026-04-10T09:30:40 1775813440

> majority of users on this planet don't use AI agents like that

Source?

Jaxkr · 2026-04-10T09:33:48 1775813628

Common sense. Most users are not running Claude Code or an on-device coding agent.

They're using ChatGPT, Gemini, or Claude on the web.

addandsubtract · 2026-04-10T11:07:00 1775819220

But I downloaded Claude.exe /s

Oras · 2026-04-10T08:34:21 1775810061

It looks so promising, but the first thing came to my mind is these models are mostly trained on the default cli output, would compressing it mess with the output of these models?