Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It's not actually a 600B+ model. It's a mixture of experts.

Is this described in the paper or was this inferred from the model itself ?

Just curious, especially if the latter.



It's a 600B+ mixture of experts and yes it's described in the paper, GitHub, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: