Hacker Newsnew | past | comments | ask | show | jobs | submit | thebeas's commentslogin

The "infinite context soon" concern comes up a lot — but even at 1M+ tokens, agents still hit limits on long enough tasks, and cost scales linearly with context size.

The compression models are the product, not the proxy. The gateway is open-source because it's the distribution layer. Anthropic, Codex, and others are iterating on this too — but each only for their own agent. We're fully agent-agnostic and solely focused on compression quality, which is itself a hard problem that needs dedicated iteration.

Try it out and let us know how to make it better!


We do both:

We compress tool outputs at each step, so the cache isn't broken during the run. Once we hit the 85% context-window limit, we preemptively trigger a summarization step and load that when the context-window fills up.


> we preemptively trigger a summarization step and load that when the context-window fills up.

How does this differ from auto compact? Also, how do you prove that yours is better than using auto compact?


For auto-compact, we do essentially the same Anthropic does, but at 85% filled context window. Then, when the window is 100% filled, we pull this precompaction + append accumulated 15%. This allows to run compaction instantly


That's why give the chance to the model to call expand() in case if it needs more context. We know it's counterintuitive, so we will add the benchmarks to the repo soon.

Given our observations, the performance depends on the task and the model itself, most visible on long-running tasks


How does the model know it needs more context?


We provide the model with a tool, we call expand() that allows the model to get access to more context if needed by using it.

We state this directly appended into the outputs so the model knows exactly where the lines were removed from.


Presumably in much the same way it knows it needs to use to calls for reaching its objective.


I'd argue not, as with tool calls it has available to it at all times a description of what each tool can be used for. There's plenty of intermediate but still important information that could be compacted away, and unless there was a logical reason to go looking for it the model doesn't know what it doesn't know.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: