Hacker Newsnew | past | comments | ask | show | jobs | submit | alifeinbinary's commentslogin

I'm developing software in this area right now, so I try a lot of the new models. They're not even close for coding tasks. It basically comes down to 26b parameters vs 1T parameters / quantisation / smaller context sizs, there's no comparison. However, for agentic work, tool calling, text summarisation, local LLMs can be quite capable. Workloads that run as background tasks where you're not concerned about TTFB, cold starts, tok/s etc., this is where local AI is useful.

If you have an M processor then I would recommend that you ditch Ollama because it performs slowly. We get double or triple tok/s using omlx or vmlx, respectively, but vmlx doesn't have extensive support for some models like gpt-oss.


Kimi K2.5 (as an example) is an open model with 1T params. I don't see a reason it has to be local for most use cases- the fact that it's open is what's important.

That is just idealism. Being "open" doesnt get you any advantage in the real world. You're not going to meaningfully compete in the new economy using "lesser" models. The economy does not care about principles or ethics. No one is going to build a long term business that provides actual value on open models. They can try. They can hype. And they can swindle and grift and scalp some profit before they become irrelevant. But it will not last.

Why? Because what was built with an open model can be sneezed into existence by a frontier model ran via first party API with the best practice configurations the providers publish in usage guides that no one seems to know exist.

The difference between the best frontier model (gpt-5.4-xhigh or opus 4.6) and the best open model is vast.

But that is only obvious when your use case is actually pushing the frontier.

If you're building a crud app, or the modern equivalent of a TODO app, even a lemon can produce that nowadays so you will assume open has caught up to closed because your use case never required frontier intelligence.


A model with open weights gives you a huge advantage in the real world.

You can run it on your own hardware, with perfectly predictable costs and predictable quality, without having to worry about how many tokens you use, or whether your subscription limits will be reached in the most inconvenient moment, forcing you to wait until they will be reset, or whether the token price will be increased, or your subscription limits will be decreased, or whether your AI provider will switch the model with a worse one, and so on.

Moreover, no matter how good a "frontier model" may be, it can still produce worse results than a worse model when the programmer who manages it does not also have "frontier intelligence". When liberated of the constraints of a paid API, you may be able to use an AI coding assistant in much more efficient ways, exactly like when the time-sharing access to powerful mainframes has been replaced with the unconstrained use of personal computers.

When I was very young I have passed through the transition from using remotely a mainframe to using my own computer. I certainly do not want to return to that straitjacket style of work.


The vision has been that the open and/or small models, while 8-16 months behind, would eventually reach sufficient capabilities. In this vision, not only do we have freedom of compute, we also get less electricity usage. I suspect long-term the frontier mega models will mainly be used for distillation, like we see from Gemini 3 to Gemma 4.

first session with gemma4:31b looks pretty good, like it may actually be up to coding tasks like gemini-3-flash levels

you can tell gemma4 comes from gemini-3


I built a steganography app that embeds encrypted messages into images.

It lives here: https://stegg.alifeinbinary.com

It's as much of an art project as it is a programming project. The images that it generates are visual representations of binary code translated from the text you enter. If you enable encryption it converts it to a hash. You can download the image, send to someone along with the password and they'll be able to decrypt it by uploading it to the app. Or you can post it to the time line and send them the link. All messages are truly private. No raw text text is sent to the server.

It's not vibe coded, I made it with typescript React. The app has a link to the github repo if you want to look under the hood.


The "as much of an art project, as an applied cryptography exercise" take resonates a lot!

Just earlier this week I've released https://github.com/kirushik/paternoster (and even won Berlin Hack and Tell with it) — and it totally came from the idea "in that state-enforced Max messenger there so much surveillance you can only praise the authorities and pray in there. What if there was a way to hide messages into the text of Church Slavonic prayers?"

I've even added TTS (where supported) to it, just for the giggles of getting the "TRUMP" dictionary thorough it.

It's still a pretty solid X25519+AES-GCM encrypted messaging design under the hood, and I'm happy with it — but it still a bit of an afterthough tbh...


Steganography is (hopefully invisibly) hiding information in an image, not creating an image that so obviously encodes information.

I really like LM Studio when I can use it under Windows but for people like me with Intel Macs + AMD gpu ollama is the only option because it can leverage the gpu using MoltenVK aka Vulkan, unofficially. We're still testing it, hoping to get the Vulkan support in the main branch soon. It works perfectly for single GPUs but some edge cases when using multiple GPUs are unsupported until upstream support from MoltenVK comes through. But yeah, I agree, it wasn't cool to repackage Georgi's work like that.

Agreed. I block Reddit, Instagram, Facebook, Twitter on my phone and work computers to avoid impulsive doom scrolling with the time vampires. I never would have seen this announcement. Reddit should not be viewed as a channel for corporate communications.

All those parameters and it still won't answer questions about Tianenman Square in 1989... :(

It will. The web chat has censorship features, but the model you can download doesn't.

i’m in the same boat. I bought mine back in 2021 and honestly I don’t regret my decision. It’s my main software development of music production computer plus every Sunday night I get to play counterstrike with the boys by dual booting into Windows. I’m able to service repair and upgrade it myself and one day when I’m ready to move on I’ll use it as my home server. The crazy thing is that my next upgrade will be going back to a MacBook Pro most likely because the thunderbolt connectivity will be able to handle the Blackmagic 4 camera broadcast capture card and NVME PCIe storage card that are in my Mac Pro right now through some external enclosure.

The only real drawback that I’ve experienced with the Mac Pro has been the lack of support for large language models on the AMD GPU due to Apple's lacklustre metal drivers but I’ve been working with a couple of other developers to port a MoltenVK translation layer to Ollama that enables LLM’s on the GPU. We’re trying to get it on the main branch since testing has gone well.

One thing a lot of commenters in this thread are overlooking is that this is the death nell for repairable and upgradable computing for Mac, which is super disappointing.


Studios are repairable. Upgrading is being deprecated however, and I’m not sure that’s bad for Apple. It may not be bad for the end user either - it feels like external TB/USB peripherals might have a longer life transferred between computers than an internal PCIe version - and a larger market as they will work with any Mac.

Hi, I read throughout your blog post and website but couldn't find a github link, so I gather this project is closed source. Is that likely to change in the future? For something as important as a terminal, considering all the secrets it comes into contact with, I would wish to audit it before blindly installing on my system. Furthermore, your website nor elsewhere within your online presence contains any identifiable information about you the creator, what country you are located in, a LinkedIn etc. There's no name attached and no face to the name. How can we trust this?

How do we know this isn't honeypot software produced by an adversarial state actor trying to conduct industrial espionage or siphon secret keys, databases and file systems? You're expecting a lot of trust from potential users but making no effort to impart it beyond your blog post that outlines how you made it, which looks suspicious if I'm being honest.

Why have you chosen to protect your anonymity and keep the project closed source?


Thanks for the reply. No github link because yes, it is closed source (although, free to use). It won't change in the future as I'm trying to model it on the Sublime Text model.

I can completely understand your concern, however, many tools these days we use are closed source (warp, cursor, sublime text, termius).

I am the creator, Jefferson Hale. Here is my linkedin: https://linkedin.com/in/jeffyaw

I'm in the United States. Lake Arrowhead, CA to be precise.


I appreciate your response and I hope you can understand my initial concern. Looking forward to trying this out :)

They just need to create a GUI interface with Visual Basic and see if they can track the IP address.

I use my Mac for film scoring and music production, so I have a long-standing practice of keeping my operating system one major version behind for stability reasons. If you want to do the same and at the same time avoid those annoying Tahoe update notifications then simply enable beta updates for OS 15 in settings. I don’t imagine I’ll ever update to Tahoe because I dislike the UI so much but honestly OS 15 is rock solid and it looks great, I’d be very happy sticking with it until EOL for this machine.


While the M5 is impressive, its capacity to compete against enterprise models like Opus, Sonnet etc is off by a couple orders of magnitude. Even the most sophisticated open source models top out at 100-200 billion parameters whereas Claude, OpenAI are north of a trillion parameters.

While SOC is definitely the future or local AI, even if you get an M5 that's jacked to the tits, you still won't be able to store the entire model in unified memory on top of the OS and whatever other applications you have running. 128GB is the upper limit for unified memory on the M5, which on paper could support a model like gpt-oss:120b but still with a nerfed context size and quantised at that. Furthermore, the cost of a maxed out Macbook Pro M5 Max is between $8-10k depending on your storage option, screen size etc., so we can safely assume that the M5 Ultra will be even more. There's also no guarantee that the Ultra will offer double the amount of unified memory, it may only offer more cores but cores aren't the current bottleneck for local AI, memory is.

If you consider what you'd be paying above and beyond your requirements barring local AI, it would be adding $5-6k to the price tag at a minimum. That equates to 5 years worth of a Claude Code subscription! Even if you shouldered that cost with your NFT fortunes you likely wouldn't achieve performance parity with CC.

I'm equally as excited as you are about the future for local AI and I am actively working in this space everyday to improve it but we're still a long way off from being able to match model size, context size, token/sec, TTFB etc. A single H100 is so OP, so hosting thousands in a data centre it's expected that it should remain unrivalled.

The area where I'm having success with local AI is by pairing local models with other supportive technologies like databases and the like to compensate for smaller context size. There are still many in roads to be made in this area and that bodes well for the future of local AI as models become more efficient and sophisticated.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: