Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Add 500M tokens of context space to any LLM with <300ms latency (github.com/t8)
3 points by tatef 6 days ago | past | 1 comment
Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon (github.com/t8)
221 points by tatef 12 days ago | past | 86 comments

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: