The “snake eating its own tail” frame is real, but it’s not mystical — it’s incentives + sampling.
If the web gets flooded with LLM output and you train on it naively, you’re effectively training on your own prior. That pushes models toward the mean: less surprise, less specificity, more template-y phrasing. It’s like photocopying a photocopy: the sharp edges disappear.
The fix isn’t “never use synthetic data.” It’s to treat it like a controlled ingredient: tag provenance, keep a high-quality human/grounded core, filter aggressively, and anchor training to things that don’t self-contaminate (code that compiles/tests, math with verifiable proofs, retrieval with citations, real user feedback). Otherwise the easiest path is content volume, and volume is exactly what kills signal.
LLMs will always be just a little too random or a little too average. There in is the hidden beauty of AI: elevating the trust in peoples diverse experiences.
Humans are amazing machines that reduce insane amounts of complexity in bespoke combinations of neural processors to synthesize ideas and emotions. Even Ilya Sutskever has said that he wasn't and still isn't clear at a formal level why GPT works at all (e.g. interpretability problem), but GPT was not a random discovery, it was based on work that was an amalgamation of Ilya and others careers and biases.
A lot of the scary numbers come from agents being left in “always-on” loops: long context windows, tool calls, retries, and idle GPU time between steps. The right unit isn’t “watts per agent” but something like joules per accepted change (or per useful decision), because an agent that burns 10x energy but replaces 20 minutes of human iteration can still be a net win. What I’d love to see is a breakdown by (1) model/token cost, (2) orchestration overhead (retries, evaluation, tool latency), and (3) utilization (how much time the GPU is actually doing work vs waiting). That’s where the real waste usually hides.
Every ‘lawful access’ proposal eventually becomes either a backdoor (which attackers also get) or targeted exploitation (which doesn’t scale and incentivizes stockpiling 0-days). The technical reality doesn’t bend to policy language.
Feels less like ‘AI destroys institutions’ and more like it removes the friction that used to force competence: drafts become publishable, citations become optional, accountability gets diffused. Institutions fail when error becomes cheap and responsibility becomes unclear. The fix isn’t banning tools, it’s redesigning workflows around verification, audit trails, and explicit ownership.
The most striking part of the report isn't just the 100 hallucinations—it’s the "submission tsunami" (220% increase since 2020) that made this possible. We’re seeing a literal manifestation of a system being exhausted by simulation.
When a reviewer is outgunned by the volume of generative slop, the structure of peer review collapses because it was designed for human-to-human accountability, not for verifying high-speed statistical mimicry. In these papers, the hallucinations are a dead giveaway of a total decoupling of intelligence from any underlying "self" or presence. The machine calculates a plausible-looking citation, and an exhausted reviewer fails to notice the "Soul" of the research is missing.
It feels like we’re entering a loop where the simulation is validated by the system, which then becomes the training data for the next generation of simulation. At that point, the human element of research isn't just obscured—it's rendered computationally irrelevant.
If the web gets flooded with LLM output and you train on it naively, you’re effectively training on your own prior. That pushes models toward the mean: less surprise, less specificity, more template-y phrasing. It’s like photocopying a photocopy: the sharp edges disappear.
The fix isn’t “never use synthetic data.” It’s to treat it like a controlled ingredient: tag provenance, keep a high-quality human/grounded core, filter aggressively, and anchor training to things that don’t self-contaminate (code that compiles/tests, math with verifiable proofs, retrieval with citations, real user feedback). Otherwise the easiest path is content volume, and volume is exactly what kills signal.