Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow, using the IGP for the parts that don't fit on the discrete GPU is a great idea.


Interestingly, mlc's web-llm runs much better on my iGPU than dGPU when the model size goes over available dGPU vram. Llama 2 7B runs faster on dGPU, but when I switch to Llama 2 13B suddenly my iGPU outperforms. I think because the iGPU effectively utilizes shared memory?


MLC has no GPU offloading. I assume the driver itself is spilling over into RAM, and this is essentially a silent malfunction because the driver implementation is extremely slow.


Yeah. It might be possible in llama.cpp soon, but the vulkan implementation may or may not be fast on the IGP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: