Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>"Reasoning", however, is a feature that has been bolted on with a hacksaw and duct tape.

What do you mean by this? Especially for tasks like coding where there is a deterministic correct or incorrect signal it should be possible to train.





it's meant in the literal sense but with metaphorical hacksaws and duct tape.

Early on, some advanced LLM users noticed they could get better results by forcing insertion of a word like "Wait," or "Hang on," or "Actually," and then running the model for a few more paragraphs. This would increase the chance of a model noticing a mistake it made.

Reasoning is basically this.


It's not just force inserting a word. Reasoning is integrated into the training process of the model.

Not the core foundation model. The foundation model still only predicts the next token in a static way. The reasoning is tacked onto the instructGPT style finetuning step and its done through prompt engineering. Which is the shittiest way a model like this could have been done, and it shows



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: