There is a pretty intense campaign to shift the overton window around coding where anchoring is used repeatedly.
When Jensen Huang says you need to spend $500k on tokens per developer per year he knows it will be perceived as bullshit, but by setting such a high he's subtly making spending $0 seem abnormal and irrational.
This article does the same thing.
It's the same reason most companies have an ultra deluxe $150 /month plan nobody buys.
"Agent-ready" for me would mean they are all being locked out, given the boot, shown the middle finger, and ideally sent into an endless fractal maze never to return.
Although it's not the world proper, but a very loud and well-paid cohort of shills, astroturfers and spin doctors. Plus the occasional useful idiot and me-too hitchhikers, no doubt.
I don't care about the GUI so much. Ollama lets me download, adjust and run a whole bunch of models and they are reasonably fast. Last time I compared it with Llama.cpp, finding out how to download and install models was a pain in Llama.cpp and it was also _much_ slower than Ollama.
If you today visit a models page on huggingface, the site will show you the exact oneliner you need to run to it on llama.cpp.
I didn't measure it, but both download and inference felt faster than ollama. One thing that was definitely better was memory usage, which may be important if you want to run small models on SCB.
Having picked it up recently and compared to both llama and lm studio - the models I was using ran faster, used less memory, and had a few extra confif options available that the others hadn't implemented yet but were suggested by the model authors.
It was easy to install, run, and access the gui to get going.
reply