Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The pretraining might not matter here so much as the instruct fine-tuning.

The small GLM models were like 50-50 English-Chinese in pretraining but much more Chinese in instruct training. Had the same issue until they balanced that.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: