Just the other day I hit something that I hadn't realized could happen. It was n...

bckmn · 2025-10-09T17:09:59 1760029799

I think asking your questions in that form is akin to "sorting prompts" that I learned about from https://mikecaulfield.substack.com/p/is-the-llm-response-wro... and I have been using successfully when when writing code (e.g. [as a Claude code slash command](https://www.joshbeckman.org/notes/936274709)).

Essentially, you're asking the LLM to do research and categorize/evaluate that research instead of just giving you an answer. The "work" of accessing, summarizing, and valuing the research yields a more accurate result.

consumer451 · 2025-10-09T21:55:37 1760046937

Thank you so much for sharing this. Myself, and I’m sure many of others, are thinking about these things a lot these days. It’s great to see how someone else is coming at the problem.

I love the grounding back to ~“well even a human would be bad at this if they did it the current LLM way.”

Bringing things back to ground truth human processes is something that is surprisingly unnatural for me to do. And I know better, and I preach doing this, and I still have a hard time doing it.

I know far better, but apparently it is still hard for me to internalize that LLMs are not magic.

cyanydeez · 2025-10-11T18:15:37 1760206537

Unfortunately, the sociopath MBAs are still generating a bubble based on instant feedback regardless of underlying value.

ACow_Adonis · 2025-10-09T20:36:09 1760042169

I find it much more intuitive to think of LLMs as fuzzy-indexed frequency based searches combined with grammatically correct probabilistic word generators.

They have no concept of truth or validity, but the frequency of inputs into their training data provides a kind of psuedo check and natural approximation to truth as long as frequency and relationships in the training data also has some relationship to truth.

For a lot of textbook coding type stuff that actually holds: frameworks, shell commands, regexes, common queries and patterns. There's lots of it out there and generally the more common form is spreading some measure of validity.

My experience though is that on niche topics, sparse areas, topics that humans are likely to be emotionally or politically engaged with (and therefore not approximate truth), or things that are recent and therefore haven't had time to generate sufficient frequency, they can get thrown off. And of course it also has no concept of whether what it is finding or reporting is true or not.

This also explains why they have trouble with genuine new programming and not just reimplementing frameworks or common applications because they lack the frequency based or probabilistic grounding to truth and because the new combinations of libraries and code leads to place of relative sparsity in it's weights that leave them unable to function.

The literature/marketing has taken to calling this hallucination, but it's just as easy to think of it as errors produced by probabilistic generation and/or sparsity.

purplerabbit · 2025-10-09T20:53:03 1760043183

Most of us probably do the same thing when we read a HN comment about something specific: "This rando seems to know what they're talking about. I'll assume it as fact until I encounter otherwise."

Not doing this might actually cause bigger problems... Getting first-hand experience or even reputable knowledge about something is extremely expensive compared to gut-checking random info you come across. So the "cheap knowledge" may be worth it on balance.

paulhebert · 2025-10-09T21:30:51 1760045451

I wish the source citing was more explicit. It would be great if the AI summary said something like, “almost no info about xyz can be found online but one GitHub comment says abc” (link)

Instead it often frames the answer as authoritative

AlwaysRock · 2025-10-09T19:54:54 1760039694

Asking for a source from llms is so eye opening. I am yet to have them link a source that actually supports what they said.

willsmith72 · 2025-10-09T20:13:38 1760040818

> I am yet to have them link a source that actually supports what they said.

You're not trying very hard then. Here, my first try: https://claude.ai/share/ef7764d3-6c5c-4d1a-ba28-6d5218af16e0

kypro · 2025-10-09T20:55:05 1760043305

But no one uses LLMs like this. This is the type of simple fact you could just Google and check yourself.

LLMs are useful for providing answers to more complex questions where some reasoning or integration of information is needed.

In these cases I mostly agree with the parent commenter. LLMs often come up with plausibly correct answers, then when you ask to cite sources they seem to just provide articles vaguely related to what they said. If you're lucky it might directly address what the LLM claimed.

I assume this is because what LLMs say is largely just made up, then when you ask for sources it has to retroactively try to find sources to justify what it said, and it often fails and just links something which could plausibly be a source to back up it's plausibly true claims.

schmichael · 2025-10-09T21:22:43 1760044963

I do, and so does Google. When I googled "When was John Howard elected?" the correct answer came back faster in the AI Overview than I could find the answer in the results. The source the AI Overview links even provides confirmation of the correct answer.

paulhebert · 2025-10-09T21:32:46 1760045566

Yeah but before AI overviews Google would have shown the first search result with a text snippet directly quoted from the page with the answer highlighted.

Thats just as fast (or faster) than the AI overview

schmichael · 2025-10-09T21:48:40 1760046520

The snippet included in the search result does not include or highlight the relevant fact. I feel like you’re not willing to take simple actions to confirm your assertions.

paulhebert · 2025-10-10T04:12:46 1760069566

When I searched, the top result was Wikipedia with the following excerpt: “At the 1974 federal election, Howard was elected as a member of parliament (MP) for the division of Bennelong. He was promoted to cabinet in 1977, and…”

To me this seemed like the relevant detail in the first excerpt.

But after more thought I realize you were probably expecting the date of his election to prime minister which is fair! That’s probably what searchers would be looking for.

neuronic · 2025-10-09T19:38:11 1760038691

I even curated a list of 6-8 sources in NotebookLM recently, asked a very straight-forward question (which credential formats does OID4VP allow). The sources were IETF and OpenID specs + some additional articles on it.

I wanted to use NotebookLM as a tool to ask back and forth when I was trying to understand stuff. It got the answer 90% right but also added a random format, sounding highly confident as if I asked the spec authors themselves.

It was easy to check the specs when I became suspicious and now my trust, even in "grounded" LLMs, is completely eroded when it comes to knowledge and facts.

hnuser123456 · 2025-10-09T19:13:28 1760037208

They will just make up links. You need to make sure they're actually researching pages. That's what the deep research mode does. That being said, their interpretation of the information in the links is still influenced by their training.

dep_b · 2025-10-10T07:11:23 1760080283

It gets more obvious once you start researching stuff that is quite niche, like how to connect a forgotten old USB device to a modern computer and the only person posting about it was a Russian guy on an almost abandoned forum.