Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
ZYZ64738
6 days ago
|
parent
|
context
|
favorite
| on:
How do I cancel my ChatGPT subscription?
> NTransformer High-efficiency C++/CUDA LLM inference engine. Runs Llama 70B on a single RTX 3090 (24GB VRAM) by streaming model layers through GPU memory via PCIe, with optional NVMe direct I/O that bypasses the CPU entirely.
untested:
https://github.com/xaskasdf/ntransformer
help
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
untested:
https://github.com/xaskasdf/ntransformer