Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Would it be possible to eliminate ~90% of the current errors?

They aren't errors. When the machine generates the text: "Next you'll tell me 2 + 2 = 5" there isn't an error in math, because no math was attempted. It's a success in predictable wording structure.

In fact, that example alone is a super success since we english speaking humans can get a broader context from that statement: You're going to tell me something nonsensical.

I don't get how you say the first part of your comment and then follow up with completely misunderstanding in the second half. Is GPT that enthralling to our brains?



Well, if my phone's autocomplete took a very sane prediction and made it insane, i'd call that a failure. There's obvious success and failure conditions even in the most basic auto completion. Same goes for 2+2.

Which is the meat of what we're discussing, right? We steer the auto completion into favorable results. We fine tune how it predicts. All of this is with the goal of a higher success rate and lower failure rate given some conditions.

No?


I feel like you didn't actually read my comment, because you're not addressing what I said.

Both "2 + 2 = 4" and "2 + 2 = 5" are correct statements following the words "I don't think you understand [token]". There are no errors to fix.


There are errors, based on evaluation criteria. To say that one token following the next is the only criteria is not how we got here, is it? We clearly have much, much more in criteria for good and bad output from these LLMs than that.

Likewise, i cranked the Temp up for experimentation, and it produced gibberish. The randomizing aspect of which token is chosen after the next was so heavy handed that it literally couldn't even create words. This wouldn't be considered "good" by most people, would it? Would it for you? Technically it's one token after the next, but it's objectively worse at auto-completion than my phone's keyboard.

My point is i think you can have both. Dumb autocomplete, but trained well enough that it's useful. That it doesn't try to say Obama is a Cat. Or that 2+2=5. Yes, there will be edge cases where the weights produce odd output - but that's the entire goal of the LLM research, right? To see how far we can steer a dumb autocomplete into usefulness.

If you're argument is that "that gibberish output is perfect, no errors found" because the program technically ran, and the weights worked.. well, i've got no reply. I can only assume you're referring to the foundational LLM, and i'm referring to the training moreso - a more holistic sense of "is the program working". But if you consider the gibberish working, then frankly most bad programs (crashing/etc) would be "working" - right? Because they're doing exactly what they're programmed to.

Working, or lack of errors, seem to be a semantic, human interpretation. But i'm quite in the weeds, heh.


> To say that one token following the next is the only criteria is not how we got here, is it?

You could just read the research about LLMs before saying stuff like this.

You can't even seem to grasp that "Obama is a cat" as a statement isn't gibberish. I'm not even trying to convince you that these programs are perfect, I'm just trying to make sure that you understand that these aren't categorical errors and the things you consider successes aren't even happening.


> I'm not even trying to convince you that these programs are perfect, I'm just trying to make sure that you understand that these aren't categorical errors and the things you consider successes aren't even happening.

Yea, we're just talking past each other. I believe i understand what you're saying. I on the otherhand, am describing errors in UX.

Your point seems pedantic, tbh. Hopefully by now i've expressed something in the way of convincing you that: for the little i do "know" about these, admittedly not much, it is that they're nothing but pattern predictors. Token outputs based on token inputs. No intelligence. Yet you spend repeated replies which sound effectively like "Stop calling them errors!" when they are very clearly errors in the context of UX.

Your argument, if i understand correctly, is pointless because the goal of the app is to have the LLM prediction be aligned with a human-centric world view. Up is not down, and the LLM should not predict tokens that espouses that. In that context, the LLM replying "Up is indeed down" would be an error. Yet repeatedly you argue that it's not an error.

In my view your argument would be better spent saying "The LLM application as we strive for today is impossible. It will never be. It's snake oil. LLMs will never be reasonably and consistently correct by human interpretation"

I don't know if that's your view or not. But it's at least not talking past me, about a point i'm not even talking about. My frame for this conversation was if we can make token prediction aligned with human goals of accuracy. You saying inaccuracies are not errors "categorically" isn't in line with my original question as i see at least. It's apples and oranges.


We're not talking past each other, you're just embarrassed.


Embarrassed about what, exactly? You seem hostile, i'm trying not to.

I stand by everything i said.

> My hope is that even if it never goes beyond being an autocomplete; if we can improve the training dataset, help it not conflict with itself, etc - that maybe the autocomplete will be insanely useful.

I stand by my first post's summary. Which is "never going past an autocomplete".

You're pedantic, and struggling to move past the fact that something can be both token prediction and still have successes and failures in the user perception. Inaccuracies.

How you write software with such a mindset is beyond me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: