Just the other day I hit something that I hadn't realized could happen. It was not code related in my case, but could happen with code or code-related things (and did to a coworker).
In a discussion here on HN about why a regulation passed 15 years ago was not as general as it could have been, I speculated [1] that it could be that the technology at the time was not up to handling the general case and so they regulated what was feasible at the time.
A couple hours later I checked the discussion again and a couple people had posted that the technology was up to the general case back then and cheap.
I asked an LLM to see if it could dig up anything on this. It told me it was due to technological limits.
I then checked the sources it cites to get some details. Only one source it cited actually said anything about technology limits. That source was my HN comment.
I mentioned this at work, and a coworker mentioned that he had made a Github comment explaining how he thought something worked on Windows. Later he did a Google search about how that thing worked and the LLM thingy that Google puts at the top of search results said that the thing worked the way he thought it did but checking the cites he found that was based on his Github comment.
I'm half tempted to stop asking LLMs questions of the form "How does X work?" and instead tell them "Give me a list of all the links you would cite if someone asked you how X works?".
Essentially, you're asking the LLM to do research and categorize/evaluate that research instead of just giving you an answer. The "work" of accessing, summarizing, and valuing the research yields a more accurate result.
Thank you so much for sharing this. Myself, and I’m sure many of others, are thinking about these things a lot these days. It’s great to see how someone else is coming at the problem.
I love the grounding back to ~“well even a human would be bad at this if they did it the current LLM way.”
Bringing things back to ground truth human processes is something that is surprisingly unnatural for me to do. And I know better, and I preach doing this, and I still have a hard time doing it.
I know far better, but apparently it is still hard for me to internalize that LLMs are not magic.
I find it much more intuitive to think of LLMs as fuzzy-indexed frequency based searches combined with grammatically correct probabilistic word generators.
They have no concept of truth or validity, but the frequency of inputs into their training data provides a kind of psuedo check and natural approximation to truth as long as frequency and relationships in the training data also has some relationship to truth.
For a lot of textbook coding type stuff that actually holds: frameworks, shell commands, regexes, common queries and patterns. There's lots of it out there and generally the more common form is spreading some measure of validity.
My experience though is that on niche topics, sparse areas, topics that humans are likely to be emotionally or politically engaged with (and therefore not approximate truth), or things that are recent and therefore haven't had time to generate sufficient frequency, they can get thrown off. And of course it also has no concept of whether what it is finding or reporting is true or not.
This also explains why they have trouble with genuine new programming and not just reimplementing frameworks or common applications because they lack the frequency based or probabilistic grounding to truth and because the new combinations of libraries and code leads to place of relative sparsity in it's weights that leave them unable to function.
The literature/marketing has taken to calling this hallucination, but it's just as easy to think of it as errors produced by probabilistic generation and/or sparsity.
Most of us probably do the same thing when we read a HN comment about something specific: "This rando seems to know what they're talking about. I'll assume it as fact until I encounter otherwise."
Not doing this might actually cause bigger problems... Getting first-hand experience or even reputable knowledge about something is extremely expensive compared to gut-checking random info you come across. So the "cheap knowledge" may be worth it on balance.
I wish the source citing was more explicit. It would be great if the AI summary said something like, “almost no info about xyz can be found online but one GitHub comment says abc” (link)
Instead it often frames the answer as authoritative
But no one uses LLMs like this. This is the type of simple fact you could just Google and check yourself.
LLMs are useful for providing answers to more complex questions where some reasoning or integration of information is needed.
In these cases I mostly agree with the parent commenter. LLMs often come up with plausibly correct answers, then when you ask to cite sources they seem to just provide articles vaguely related to what they said. If you're lucky it might directly address what the LLM claimed.
I assume this is because what LLMs say is largely just made up, then when you ask for sources it has to retroactively try to find sources to justify what it said, and it often fails and just links something which could plausibly be a source to back up it's plausibly true claims.
I do, and so does Google. When I googled "When was John Howard elected?" the correct answer came back faster in the AI Overview than I could find the answer in the results. The source the AI Overview links even provides confirmation of the correct answer.
Yeah but before AI overviews Google would have shown the first search result with a text snippet directly quoted from the page with the answer highlighted.
Thats just as fast (or faster) than the AI overview
The snippet included in the search result does not include or highlight the relevant fact. I feel like you’re not willing to take simple actions to confirm your assertions.
When I searched, the top result was Wikipedia with the following excerpt: “At the 1974 federal election, Howard was elected as a member of parliament (MP) for the division of Bennelong. He was promoted to cabinet in 1977, and…”
To me this seemed like the relevant detail in the first excerpt.
But after more thought I realize you were probably expecting the date of his election to prime minister which is fair! That’s probably what searchers would be looking for.
I even curated a list of 6-8 sources in NotebookLM recently, asked a very straight-forward question (which credential formats does OID4VP allow). The sources were IETF and OpenID specs + some additional articles on it.
I wanted to use NotebookLM as a tool to ask back and forth when I was trying to understand stuff. It got the answer 90% right but also added a random format, sounding highly confident as if I asked the spec authors themselves.
It was easy to check the specs when I became suspicious and now my trust, even in "grounded" LLMs, is completely eroded when it comes to knowledge and facts.
They will just make up links. You need to make sure they're actually researching pages. That's what the deep research mode does. That being said, their interpretation of the information in the links is still influenced by their training.
It gets more obvious once you start researching stuff that is quite niche, like how to connect a forgotten old USB device to a modern computer and the only person posting about it was a Russian guy on an almost abandoned forum.
In a discussion here on HN about why a regulation passed 15 years ago was not as general as it could have been, I speculated [1] that it could be that the technology at the time was not up to handling the general case and so they regulated what was feasible at the time.
A couple hours later I checked the discussion again and a couple people had posted that the technology was up to the general case back then and cheap.
I asked an LLM to see if it could dig up anything on this. It told me it was due to technological limits.
I then checked the sources it cites to get some details. Only one source it cited actually said anything about technology limits. That source was my HN comment.
I mentioned this at work, and a coworker mentioned that he had made a Github comment explaining how he thought something worked on Windows. Later he did a Google search about how that thing worked and the LLM thingy that Google puts at the top of search results said that the thing worked the way he thought it did but checking the cites he found that was based on his Github comment.
I'm half tempted to stop asking LLMs questions of the form "How does X work?" and instead tell them "Give me a list of all the links you would cite if someone asked you how X works?".
[1] https://news.ycombinator.com/item?id=45500763