I've got a 128 GiB unified memory Ryzen Ai Max+ 395 (aka Strix Halo) laptop.
Trying to run LLM models somehow makes 128 GiB of memory feel incredibly tight. I'm frequently getting OOMs when I'm running models that are pushing the limits of what this can fit, I need to leave more memory free for system memory than I was expecting. I was expecting to be able to run models of up to ~100 GiB quantized, leaving 28 GiB for system memory, but it turns out I need to leave more room for context and overhead. ~80 GiB quantized seems like a better max limit when trying not running on a headless system so I'm running a desktop environment, browser, IDE, compilers, etc in addition to the model.
And memory bandwidth limitations for running the models is real! 10B active parameters at 4-6 bit quants feels usable but slow, much more than that and it really starts to feel sluggish.
So this can fit models like Qwen3.5-122B-A10B but it's not the speediest and I had to use a smaller quant than expected. Qwen3-Coder-Next (80B/3B active) feels quite on speed, though not quite as smart. Still trying out models, Nemotron-3-Super-120B-A12B just came out, but looks like it'll be a bit slower than Qwen3.5 while not offering up any more performance, though I do really like that they have been transparent in releasing most of its training data.
There's been some very recent ongoing work in some local AI frameworks on enabling mmap by default, which can potentially obviate some RAM-driven limitations especially for sparse MoE models. Running with mmap and too little RAM will then still come with severe slowdowns since read-only model parameters will have to be shuttled in from storage as they're needed, but for hardware with fast enough storage and especially for models that "almost" fit in the RAM filesystem cache, this can be a huge unblock at negligible cost. Especially if it potentially enables further unblocks via adding extra swap for K-V cache and long context.
Most workstation class laptops (i.e. Lenovo P-series, Dell Precision) have 4 DIMM slots and you can get them with 256 GB (at least, before the current RAM shortages).
There's also the Ryzen AI Max+ 395 that has 128GB unified in laptop form factor.
Only Apple has the unique dynamic allocation though.
Yep, I have a 13" gaming tablet with the 128 GB AMD Strix Halo chip (Ryzen AI Max+ 395, what a name). Asus ROG Flow Z13. It's a beast; the performance is totally disproportionate to its size & form factor.
I'm not sure what exactly you're referring to with "Only Apple has the unique dynamic allocation though." On Strix Halo you set the fixed VRAM size to 512 MB in the BIOS, and you set a few Linux kernel params that enable dynamic allocation to whatever limit you want (I'm using 110 GB max at the moment). LLMs can use up to that much when loaded, but it's shared fully dynamically with regular RAM and is instantly available for regular system use when you unload the LLM.
I configured/disabled RGB lighting in Windows before wiping and the settings carried over to Linux. On Arch, install & enable power-profiles-daemon and you can switch between quiet/balanced/performance fan & TDP profiles. It uses the same profiles & fan curves as the options in Asus's Windows software. KDE has native integration for this in the GUI in the battery menu. You don't need to install asus-linux or rog-control-center.
For local AI: set VRAM size to 512 MB in the BIOS, add these kernel params:
Pages are 4 KiB each, so 120 GiB = 120 x 1024^3 / 4096 = 31457280
To check that it worked: sudo dmesg | grep "amdgpu.*memory" will report two values. VRAM is what's set in BIOS (minimum static allocation). GTT is the maximum dynamic quota. The default is 48 GB of GTT. So if you're running small models you actually don't even need to do anything, it'll just work out of the box.
LM Studio worked out of the box with no setup, just download the appimage and run it. For Ollama you just `pacman -S ollama-rocm` and `systemctl enable --now ollama`, then it works. I recently got ComfyUI set up to run image gen & 3d gen models and that was also very easy, took <10 minutes.
I can't believe this machine is still going for $2,800 with 128 GB. It's an incredible value.
You may wanna see if openrgb isn't able to configure the RGB. Could even do some fun stuff like changing the color once done with a training run or something
I use openrgb to turn off all the RGB crap on my desktop machine. Unfortunately you have to leave openrgb running and it takes a constant 0.5% of CPU. I wish there was a "norgb" program that would simply turn off RGB everywhere and not use any CPU while doing it.
Really appreciate this response! Glad to hear you are running Arch and liking it.
I've been a long-time Apple user (and long-time user of Linux for work + part-time for personal), but have been trying out Arch and hyprland on my decade+ old ThinkPad and have been surprised at how enjoyable the experience is. I'm thinking it might just be the tipping point for leaving Apple.
I just did!
Warmly encouraging you to try it out!
Managed to put Omarchy on an external ssd on my old macbookpro 2019; rarely booting in macos now. Long time i haven’t enjoyed using a computer SO MUCH!
> Only Apple has the unique dynamic allocation though.
What do you mean? On Linux I can dynamically allocate memory between CPU and GPU. Just have to set a few kernel parameters to set the max allowable allocation to the GPU, and set the BIOS to the minimum amount of dedicated graphics memory.
Maybe things have changed but the last time I looked at this, it was only max 96GB to the GPU. And it isn't dynamic in the sense you still have to tweak the kernel parameters, which require a reboot.
Strix Halo you can get at least 120 GB to the GPU (out of 128 GB total), I'm using this configuration.
Setting the kernel params is a one-time initial setup thing. You have 128 GB of RAM, set it to 120 or whatever as the max VRAM. The LLM will use as much as it needs and the rest of the system will use as much it needs. Fully dynamic with real-time allocation of resources. Honestly I literally haven't even thought of it after setting those kernel args a while ago.
So: "options ttm.pages_limit=31457280 ttm.page_pool_size=31457280", reboot, and that's literally all you have to do.
Oh and even that is only needed because the AMD driver defaults it to something like 35-48 GB max VRAM allocation. It is fully dynamic out of the box, you're only configuring the max VRAM quota with those params. I'm not sure why they choice that number for the default.
You do have to set the kernel parameters once to set the max GPU allocation, I have it set to 110 GiB, and you have to set a BIOS setting to set the minimum GPU allocation, I have it set to 512 MiB. Once you've set those up, it's dynamic within those constraints, with no more reboots required.
On Windows, I think you're right, it's max 96 GiB to the GPU and it requires a reboot to change it.
I literally just ran into this myself with my spouse. She is ready to upgrade her M1 MacBook Air and thinks she doesn’t need more RAM because everything is “in the cloud”. Hopefully 8GB is enough RAM for the next 5 years or so...
I'm for all for less bus stops, but how do you make it equitable for people who can't walk longer distances if they are disabled or have an underlying health condition? Run a separate paratransit line?
In European cities this is mitigated by having low-floor buses and stops with level boarding to support mobility scooters and wheelchairs. There are also dedicated taxis available for people with disabilities (possibly subsidised). Over a long term this is also a self-regulating problem. Elderly people and services/businesses for them take into account availability of public transit when choosing properties.
Buses are mass transit. The real goal isn't serving poor people, but moving people with higher throughput than it's possible by cars individually (a single bus fits ~50 people). If you make bus lines slow and fail to attract significant numbers of passengers by forcing buses to serve every whatabout case, you're making them fail at their primary goal.
You can't make half-pregnant public transit. If you have a congested city, and just add nearly empty buses sitting in traffic and blocking lanes at every intersection, it will be strictly worse for everyone. OTOH if you can make buses an attractive option, then each bus can take 30+ cars off the road, leaving room for dedicated bus lanes, more buses, resulting in faster and more regular service.
I would agree with and extend your remarks that we also have problems where traffic patterns and geography don't match political boundaries and transit is traditionally locally run and locally budgeted.
So in the USA you end in scenarios where it takes 20 minutes to drive 20 miles but a bus would take four legs with three transfers across three separate city bus companies, figure at least three hours each way. And again, as per your "mass transit" you can't expect taxpayers in my city to provide a special bus run into my neighboring adjacent city much less the city next to that one.
This results in people being very happy indeed to pay the financial and environmental costs of car ownership to avoid sitting in a bus for six hours of daily commute.
There are also interesting social issues; if you're late its a personal failing, even if you take mass transit. I recall a friend at work getting fired because the bus was late too many times. Oh well, should have bought a car. The feeling of not being in control is further worse due to crime rates. No one will sneak up on my wife and stab her in the neck in her car, but it certainly happens on buses and no one cares if it happens depending on local race relations. None of the other passengers on the bus even cared, for racial reasons. Its pretty messed up here.
Its easy for the public in general to advise others to do inconvenient or career ending or life threatening activities, to "save the planet" or whatever, but I wouldn't do it, and I'd certainly never let my wife or kids do it, so we own cars and avoid public transit at all costs. Not taking that advice as been pretty nice so far.
The answer is to keep the same number of stops but run two or more vehicles simultaneously. Or open more doors. Or expedite fares.
The authors get mixed up equating count of marked stops with dwell time. Running leapfrogging vehicles , or numerous other strategies, reduces dwell time because one boards passengers and the other disembarks at any given stop or vice versa.
In fact, I’d argue bus fare gates, steps, 1-door loading and traffic signal/stop interactions are far more significant than stop count.
> The answer is to keep the same number of stops but run two or more vehicles simultaneously.
How exactly does that help? If you’re suggesting every bus go to alternate stops leapfrogging each one in the middle then that will cause a lot of confusion especially for tourist heavy cities.
In 2022 according to the transit system annual report, the suburban quarter million person city I live in has ten routes and operates about 12 hours per day and per the annual report average weekday service consumed is 1556 UPT, so 1556 people step aboard the system and toss coins in the fare jar or pay with the app. UPT means they're not tracking transfers and essentially 100% of trips require a transfer so the real number of people served daily is closer to 775 than to 1550, but we'll run the optimistic numbers. Each of the ten hourly routes is about 4 miles long. So the overall system drives 12 hours * 10 routes * 4 miles * 5280 feet/mile = 2.5 million feet per day and divide that by 1556 passengers per day that's a pax every 1628 feet driven on an average day.
So if we had a bus stop every 800 feet, on average half the stops would be empty and passed by. If that high level of use is causing too much congestion and slow down at stops, if we had two buses running out of phase, pax arrive at the same rate, so we'd pick up a pax every 3000+ feet driven. So if we had bus stops every 500 feet to keep people happy, on average the bus would drive right by about 5 out of 6 empty stops, which seems reasonable and would not result in unusual delays or congestion. Also the bus would pass by every half hour not every hour, which would probably increase ridership a lot.
So if the only labor expense were the $23/hr driver, and we pay 10 drivers on 10 routes, to drive twelve times, thats $23/hr * 10 routes * 12 hours if everything except driver labor were free that means we spend $2760 per day to transport 1556 people, or about $1.77 per trip (assuming diesel is free, buses never wear out, etc). If we doubled the number of bus that would be $5520 of driver labor to move 1556 people per day or $3.55 cost per pax trip. On one hand the actual annual total "OE per UPT" counting weekends and maint and office people and dispatchers etc, according to the annual report is $13.94, so an extra $1.77 would seem cheap, but the bus does not run for free and the total expense of doubling the runs might cost as much as an extra $14 per pax trip.
The costs don't really matter, if the taxpayers want it as a luxury bragging feature of the city. Everyone wants everyone else to use it even though no one would be caught dead actually using it. My point being that adult fare is $2 but adults don't ride its mostly elderly and disabled at the $1 fare, so a profit (loss) ratio of (28 - 1)/28 with two buses per route isn't much worse than (14 - 1)/14 with one bus per route.
Maybe another way to look at the analysis is in my city if the stops are more than 1600 feet apart there will be multiple people per stop and that would "slow things down" whereas a small fraction like 400 feet would mean the bus mostly just speeds by.
No one can seem to explain why we can't have infinite bus stops. How about every stop sign is a bus stop? The bus has to stop anyway. Artificial scarcity to drive down ridership, I suppose.
Great idea, would it be possible to make it possible to add my own custom tshark one liners under weird stuff? For example, sometimes I find myself troubleshooting TCP retransmission issues that is specific to proprietary applications and that may not be relevant everyone else to have by default.
As an aside, I was thinking about something similar to this tool for a while now after seeing this post (https://news.ycombinator.com/item?id=46723990) where someone was using Claude to troubleshoot a PCAP. It made me think that it would be nice just to have a nice collection of tshark one-liners to quickly weed out any weird stuff right off the bat. I would assume that it would be a lot more performant than using a LLM and more scalable if you have large PCAP files.
absolutely. May be the best way to do this would be some kind of a recipe store where the user can run (we can fuzzy match?) tshark oneliners. I'd love your thoughts on what the easiest/quickest integration would be.
This is a great idea, thanks. I built an IPv6 only webhost in Digital Ocean a while ago as a learning exercise and it’s been sitting idle. Making a personal portal sounds like a fun project.
I just use “New Tab Redirect” in Chrome so I can make new tabs default to google.com. The home page only applies to the initial browser window/tab. It’s pretty silly.
This has been the worst downgrade for me. Response times have skyrocketed. The minimum time to response is now a few seconds slower and the responses continue to be low quality. The speaker seems to have even lost some functionality where it says "I can't do that yet" for a thing it mostly could already do previously. If I could tell anyone with Google Home devices one thing, it would be to not 'upgrade' to Gemini.
Google update sucks. Not only is it slower + generally dumber, but their AI alignment has made it refuse to answer very normal questions (I got scolded by my speaker for asking about the hours of the nearest liquor store)
No experience with Google yet. Amazon still has more work to do with making their tool-use calls bullet proof, but I've been able to search Youtube naturally and "open the first video" (this is still rudimentary, it's not perfect yet). Pretty good success moving around the Fire TV app with voice (open, exit), reasonably good at switching Live channels. Really fun with Amazon music, "pull up a Taylor Swift album, but acoustic only, from 2010s ...", stuff like that. It's great, and I expect it to become rather ... perfect in time.
Other things:
- Great for todo/reminders with timers
- "Hey Alexa, turn my lights on at 5 everyday, close them at 12"
- Not great at controlling Prime Video yet, can search it, but not great yet at all. Expecting this to be perfect at some point as well.
They also had this as option to pay at Amazon Fresh, which seemed odd to me. You needed to use your phone to scan the QR code from your phone anyway, and they charged the credit card on file in your Amazon account.