Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No. As far as I know, they haven't said anything about this. Neither did OpenAI about gpt-4-32k.

MosaicML did say something about MPT-7B-StoryWriter-65k+: https://www.mosaicml.com/blog/mpt-7b. They are using ALiBi (Attention with Linear Biases): https://arxiv.org/abs/2108.12409.

I think OpenAI and Anthropic are using ALiBi or their own proprietary advances. Both seem possible.



Interesting. Does the decision to use ALiBi have to be done before the model weights are first trained, or is there a way that these models could have incorporated ALiBi instead or in addition to an alternate positional encoding method to ALiBi after they were first trained?


The decision needs to be made before starting training. Maybe there is a clever way to add it after the fact in the style of LoRA? First, that would be a different method in its own right (just as LoRA is), second, I can't see how to do so easily. But then I just thought about it for a minute.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: