Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
anemll
9 months ago
|
parent
|
context
|
favorite
| on:
Run LLMs on Apple Neural Engine (ANE)
Right.I was thinking about it, you still need batch refill, however, Apple Core ML tools were failing for attention activations quantization. Long context, pre-fill is still compute bound.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: