Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes!

I tried that pretty early on, the its basically never good. Its described in the the section: https://dnhkng.github.io/posts/rys/#the-beginning-of-llm-neu...



How about, as you found repeating x-y was useful for locating the block of 7 layers in the first place; I'd be incredibly curious if, knowing that block of 7, if you then iterated from repeating x-y in that block z times.

Like for those 7 layers 1,2,3,4,5,6,7 does efficiency increase if you run 1,2,3,3,4,4,4,5,6,7 or perhaps 1,2,3,3,4,5,6,6,7 etc. If only GPUs grew on trees


Yes, I have done these thype of experiments; thats for the next post

If you found two disjoint sections that seemed positive on their own, did you try looping both separately in the same model? Wondering how localized the structures are.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: