Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you focus on just the matmuls, no CUDA, no architectures, no infinibands, everything-on-a-chip - put input tokens in input registers, get output tokens from output registers from a model that's baked into gates - you should be able to save some power. Not sure if 10x or 2x or 100x, but certainly there are gains to be had.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: