Hacker Newsnew | past | comments | ask | show | jobs | submit | ACCount37's commentslogin

I don't see much similarity? Unless you're looking at self-distillation in general and not just this use of it.

How not?

I think the analogy is actually pretty specific to this paper, not just self-distillation in general.

During sleep your brain replays experiences but noisy and distorted. The replays are often incoherent as narratives (dreams are weird). But the consolidation still works because the value isn't in the narrative coherence, it's in the activation patterns at each moment. Important pathways get strengthened, weak ones get pruned. Section 4.4 of this paper is what makes the connection click. They cranked training temperature to 2.0 with no truncation. 62% of the sampled outputs had no extractable code. Coherent Python that devolves into multilingual gibberish halfway through. The model still improved (+5.7pp pass@1).

This makes no sense if you think the model is learning from good code examples. But it makes a lot of sense if you think of it as the model replaying its own knowledge back to itself in a noisy/distorted form, and the replay process strengthening what matters (sharp distributions at "lock" positions where one token is correct, broad distributions at "fork" positions where multiple approaches work) while pruning what doesn't (distractor tails). The model doesn't learn anything new. It just wakes up performing better because what it already knew got cleaned up.

How is this comment not at number 1??


This is a property of self-distillation.

Self-distillation shifts the behavior of the model towards that of the model + steering. As such, you don't strictly "need" the tokens to be in-domain for it to work. The logits are a vessel for transferring the steering into the model's internals.

The tokens can be gibberish. What transfers isn't whether they're gibberish or not, but how the flavor of model predictions, if given gibberish, differs from that of an unsteered version of itself.

In this specific case, the behavioral difference comes from the "temperature-shifted, truncated samples" in the "teacher" sampling strategy, and it is that difference that is internalized by the "student" model.


I think we’re agreeing. The point of the sleep parallel is exactly that the content doesn’t matter, and it’s the filtering process that does the work. Brains replay noisy, sometimes incoherent patterns during sleep and the value is in how that replay reshapes connection weights, not in whether the replay is accurate. That’s the same principle you’re describing with the steering signal

I.e sleep replays don’t need to replay Tuesday’s meeting accurately. They just need to activate the relevant pathways so that the strong ones fire and the weak ones don’t. The pattern of what fires versus what doesn’t is the signal. The “content” of the dream is basically irrelevant.


Not very. LLMs derive a lot of their capability profile from the sheer scale.

LLMs have something that's not entirely unlike the "g factor" in humans - a broad "capability base" that spans domains. The best of the best "coding LLMs" need both good "in-domain training" for coding specifically and a high "capability base". And a lot of where that "base" comes from is: model size and the scale of data and compute used in pre-training.

Reducing the model scale and pruning the training data would result in a model with a lower "base". It would also hurt in-domain performance - because capabilities generalize and transfer, and pruning C code from the training data would "unteach" the model things that also apply to code in PHP.

Thus, the pursuit of "narrow specialist LLMs" is misguided, as a rule.

Unless you have a well defined set bar that, once cleared, makes the task solved, and there is no risk of scope adjustment, no benefit from any future capability improvements above that bar, and enough load to justify the engineering costs of training a purpose-specific model? A "strong generalist" LLM is typically a better bet than a "narrow specialist".

In practice, this is an incredibly rare set of conditions to be met.


It's more complicated than that. Small specialized LLMS are IMO better framed as "talking tools" than generalized intelligence. With that in mind, it's clear why something that can eg look at an image and describe things about it or accurately predict weather, then converse about it, is valuable.

There are hardware-based limitations in the size of LLMs you can feasibly train and serve, which imposes a limit in the amount of information you can pack into a single model's weights, and the amount of compute per second you can get out of that model at inference-time.

My company has been working on this specifically because even now most researchers don't seem to really understand that this is just as much an economics and knowledge problem (cf Hayek) as it is "intelligence"

It is much more efficient to strategically delegate specialized tasks, or ones that require a lot of tokens but not a lot of intelligence, to models that can be served more cheap. This is one of the things that Claude Code does very well. It's also the basis for MOE and some similar architectures with a smarter router model serving as a common base between the experts.


No. There's no "answer" really.

They use self-distillation to shift the output distribution of the model towards that of the same model, but running with different temperature/truncation settings in sampling.

This effectively "folds" the logit tail truncation behavior into the model itself.

Not entirely unlike a few "model controlled sampling settings" things I've seen in what it does, but different in execution.


Traditional RAG is a poor fit for this generation of LLMs, because it doesn't fit the "agentic tool use" workflow at all.

Self-guided "grep on a filesystem" often beats RAG because it allows the LLM to run "closed loop" and iteratively refine its queries until it obtains results. Self-guided search loop is a superset of what methods like reranking try to do.

I don't think vector search and retrieval is dead, but the old-fashioned RAG is. Vector search would have to be reengineered to fit into the new agentic workflows, so that the advantages of agentic LLMs can compound with that of vector search - because in current day "grep vs RAG" matchups, the former is already winning on the agentic merits.

"Optimize grep-centric search" is a surprisingly reasonable stopgap in the meanwhile.


Probably because their air defenses were too busy getting shot to shit.

There was a lot of Iranian AA losses in the opening phase of this war. US went town on anything that looked remotely like AA to secure the sky for themselves, and operated with ever-increasing impunity since.

Between advanced ISR, stealth, ECM and stand-off munitions, US has a lot of tools to make the lives of AA crews into a living hell.

It's unclear what happened here exactly. It might be a "straggler" SAM that wasn't destroyed in the strikes, might be US going too aggressively and putting reduced survivability airframes within an area that wasn't sufficiently cleared, might be an Iranian adaptation not unlike the "SAMbushes" seen in Ukraine.

I don't see it as a sign that Iran is somehow reconstituting its AA capabilities though.


> It's unclear what happened here exactly.

It really isn't. A huge portion of Iran's air defenses are designed for road-mobility and pop-up attacks instead of long-term point defense, encompassing hundreds of launchers total: https://en.wikipedia.org/wiki/List_of_equipment_of_the_Islam...

Military strategists long warned that air campaigns flying over South Iran would have to contend with passively-guided SAMs and MANPADS on their way to Tehran. There are hundreds of road-accessible caves in the Zagros range that cannot be inspected via satellite. They inherently present a risk to overflights unless they are occupied on the ground first; it's common knowledge why Kohgiluyeh and Fars are so dangerous.


Considered that this is the first jet shot down in the entire war? "High mobility SAMs" alone completely fails to explain it.

They had high mobility SAMs for the entire war, with nothing to show for it. Something else must have happened there.


Not the first jet hit, and definitely not the first aircraft downed. The F-35 incident was widely suspected to be a Qaem 118 missile, which fits the bill for road mobile multispectral SAM perfectly. More than a dozen drones were downed too, and even the rescue helicopters are evading air defenses according to CENTCOM.

> They had high mobility SAMs for the entire war, with nothing to show for it.

This is certainly something to show for it. Iran's air defenses are not like Israel's or Qatar's, they don't have the money or security to build expensive anti-ballistic layers for air defense. These smaller, road-mobile systems are intended to exploit an overextending enemy, and for that purpose they're apparently working quite well.


I wouldn't count 2-3 downed jets across many thousands of sorties "working quite well", no.

I don't think a single USAF officer is seeing the silver lining while the Iran conflict continues to escalate.

These people are professionals, they go to school to study REDFOR tactics and get court-martialed when their missions go sideways. They are not looking at the SEAD situation of southern Iran with uncertainty as to how this happened. You are the only one that has voiced that confusion.


Iran's regime is an radical Islamic theocracy that has "Death to America" as a matter of policy, supports every other radical Islamist militia in the entire Middle East region, and tried to build nukes after being told, repeatedly, not to build nukes.

I don't know about you, but the idea of a radical Islamic theocracy and a well known source of Middle East instability having nukes doesn't sit well with me. As far as reasons to invade countries go, this alone would make for a damn good one.


If a button existed that magically turned Iran into a secular-ish democracy(-ish) like Turkey then, yes, I would expect the President of the United States to press it.

No such button exists, and it's increasingly clear that this war will leave the entire world far worse off while further entrenching the current Iranian regime.


"Far worse off" how exactly? "Entrenching" how exactly?

Iranian regime wasn't doing that well even when it wasn't actively bombed. And "rally around the flag" only goes so far in a country that has been killing protestors by the thousands.

I don't see this war ruining Iran's regime overnight as is. But if it comes up with a sustained effort to pressure Iran, or a ground operation to topple the regime directly, it well might.


> "Far worse off" how exactly? "Entrenching" how exactly?

Hardliners and the IRGC have significantly more power than before, and however few moderates that remain have much less political capital and are at much greater risk of being purged.

If Iran doesn't win significant concessions tayt the sucker-punch attacks will never be repeated again[1], they are guaranteed to sprint towards the minimum viable nuke.

1. Bibi will refuse, obviously, and Americas capacity to leash him is questionable.


"Moderates" in Iran were consecutively dismantled and purged for decades. A country that has moderates providing a meaningful counterbalance to hardliners doesn't kill protestors by thousands.

Pre-war, the situation was bad enough that dropping bombs on Iran's key decision-makers might have actually made the government more moderate on average. Not that it matters much. "More moderate" in context of Iran's government isn't anywhere near "moderate" either way.


The "Nothing ventured, nothing lost" attitude would make a lot more sense if the region would go back to the status quo ante after the "excursion."

Here's an idea I heard put forth because Iran is asking for a great power guarantee against future incidents like this.

Have China the guarantor, build military bases, and put them under their nuclear deterrence umbrella. Iran can be assured they won't be bombed, the West can be assured they won't have nukes. (in theory, I largely assume the CCP will not aid in their construction or let them have nukes under such an arrangement).

Thing is, all the little countries are looking at what happened to Ukraine (who gave up their nukes), Iran (who has not gotten them yet), and North Korea (who has them). Their looking and thinking, if I had nukes, I probably wouldn't be the target of regime change.


Why would China agree to that? It's an insane proposition for them. "You have to put bases in a country where you have no strategic reason to do so, and in addition, you agree that if that country is attacked then you have to nuke the US, guaranteeing your own destruction."

They want a base in the Middle East and they have many reasons to be there, oil being one of them, they actually get it from there. As Trump says (today), the US does not have any need for their oil, so in that sense China has more reason to be there.

Mutually Assured Destruction has worked for 75 years, China is aggressively expanding their stockpiles. Would the US or Israel risk a war with China over Iran if they get the assurances from the Chinese they will keep Iran on a tight leash?

> Why would China agree to that?

Ultimately the aim to displace the US as the world hegemon. Having bases across the world is what hegemons do.


Half-price oil?

So Iran would sell its main natural resource at half-price forever to pay for China to keep bases on its territory? That also doesn’t seem plausible.

Realistically there is no amount that Iran would be willing to pay and that China would be willing to accept for China to essentially agree to be responsible for the defense of Iran. It’s a non-starter.


Does it matter? US is a net oil exporter, and not exactly starved for Gulf oil. And every day the strait stays closed is a day other Gulf states have a very pressing reason to conflict with Iran. As if Iran didn't give enough of those to the entire region.

Iran isn't somehow able to exert infinite economic pressure forever. They can play the chaos monkey, but how much does it helps them? Threats only work on those who cave in to them.


It does matter because oil is a global commodity, the fact that the US is a net exporter doesn't stop the prices from going up and other follow-on impacts to the global economy.

It means that US isn't hit the hardest. There's no "we have to end the war this month or our country grinds down to a halt". Just the slow grind of economic pressure that, I remind, affects more countries than just the US - and many of them far stronger.

US leadership can just say "this isn't enough to deter us" and proceed with the rest of the war however they want.


The Iranian regime is betting that they can outlast Donald Trump on this front. Trump's War is very unpopular and they don't care what the Iranian people think or suffer through.

> US is a net oil exporter, and not exactly starved for Gulf oil

I suggest not taking anything Trump says as the truth: https://xcancel.com/chrismartenson/status/203952370406177223...


Holy shit, thats really saying the quiet part loud.

“Does it matter?”

Yes, Who cares about the rest of the world?

Nations shutting down, businesses shutting down, and all because the elected leader of America got involved in a war to avoid accusations of pedophilia.

And lest we forget, this is the nuclear superpower. Thank god there is no conspiracy theory about Nukes being useful so far. I have more faith that the administration will bend towards conspiracies than away from them.


I don't see Iran trying, and failing, to hold the world economy hostage as a reason to go against "no negotiations with terrorists".

If oil hits $200/barrel and inflation is double digits, people will have different priorities.

Pretty much.

US military is performing quite well. US political leadership is the questionable part of this war.

It would sure be nice if White House gave a reason to believe that there's an actual plan for dismantling Iran's regime, or Iran's influence, that goes beyond "wing it".


Iran had one of the largest and most extensive integrated air defense networks in the world. US has been bombing Iran from day 0 of this war. Those are the air losses they took.

Being able to counter air defenses to this degree and operate with this level of impunity is a major SEAD/DEAD win.


"How many fuel launches" is the error margin.

If they get less performance or more mission payload, they can add tanker launches. If they get more performance or less mission payload, they can remove tanker launches.

People ran into "the design is 10% heavier than planned for unexpected engineering reasons and now we have to make hard choices" on space missions far less complex than a literal Moon landing. SpaceX has externalized the "hard choices" into the tanker count, pre-emptively.

The lunar orbit of Artemis is defined mainly by SLS/Orion's performance, or lack of thereof. The specific NRHO was a Gateway choice, and might now be dead alongside it, but by itself, Orion can't get to low Lunar orbit. Which drives some peculiar design choices.


So many (perceived) problems with spaceflight and building moon bases and the like are solved by simply making the process and cost of launching faster, easier and cheaper; the problem that NASA has always had is that each launch, even with the reusable space shuttles, cost billions and took years of engineering, planning, etc. To the point where yesterday's launch was done with (what I perceive to be) salvaged parts where the engineering was done decades ago, because engineering something new would be too expensive and take too long.

Sure, don't fix what isn't broken and all - *nix tools are decades old too after all - but still.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: