There is so much confusion on this topic. Please don't spread more of it; the answers are just a quick google away. To spell it out:
1) AI companies make money on the tokens they sell through their APIs. At my company we run Claude Code by buying Claude Sonnet and Opus tokens from AWS Bedrock. AWS and Anthropic make money on those tokens. The unit economics are very good here; estimates are that Anthropic and OpenAI have a gross margin of 40% on selling tokens.
2) Claude Code subscriptions are probably subsidized somewhat on a per token basis, for strategic reasons (Anthropic wants to capture the market). Although even this is complicated, as the usage distribution is such that Anthropic is making money on some subscribers and then subsidizing the ultra-heavy-usage vibe coders who max out their subscriptions. If they lowered the cap, most people with subscriptions would still not max out and they could start making money, but they'd probably upset a lot of the loudest ultra-heavy-usage influencer-types.
3) The biggest cost AI companies have is training new models. That is the reason AI companies are not net profitable. But that's a completely separate set of questions from what inference costs, which is what matters here.
without training new models, existing models will become more and more out of date, until they are no longer useful - regardless of how cheap inference is. Training new models is part of the cost basis, and can't be hand waved away.
Only if you’re relying upon the models to recall facts from its training set - intuitively, at sufficient complexity, models ability to reason is what is critical and can have its answers kept up to date with RAG.
Unless you mean out of date == no longer SOTA reasoning models?
'ability to reason' implies that LLMs are building a semantic model from their training data, whereas the simplest explanation for their behavior is that they are building a syntactic model (see Plato's Cave). Thus without new training they cannot 'learn', RAG or no RAG.
If you're using the models to assist with coding—y'know, what this thread is about?—then they'll need to know about the language being used.
If you're using them for particular frameworks or libraries in that language, they'll need to know about those, too.
If training becomes uneconomical, new advances in any of these will no longer make it into the models, and their "help" will get worse and worse over time, especially in cutting-edge languages and technologies.
1) AI companies make money on the tokens they sell through their APIs. At my company we run Claude Code by buying Claude Sonnet and Opus tokens from AWS Bedrock. AWS and Anthropic make money on those tokens. The unit economics are very good here; estimates are that Anthropic and OpenAI have a gross margin of 40% on selling tokens.
2) Claude Code subscriptions are probably subsidized somewhat on a per token basis, for strategic reasons (Anthropic wants to capture the market). Although even this is complicated, as the usage distribution is such that Anthropic is making money on some subscribers and then subsidizing the ultra-heavy-usage vibe coders who max out their subscriptions. If they lowered the cap, most people with subscriptions would still not max out and they could start making money, but they'd probably upset a lot of the loudest ultra-heavy-usage influencer-types.
3) The biggest cost AI companies have is training new models. That is the reason AI companies are not net profitable. But that's a completely separate set of questions from what inference costs, which is what matters here.