Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely.
It's true, the post lays out the details clearly, but a hands-on example can often make the concepts more tangible. Seeing it in action helps solidify understanding.
The post lays out the steps clearly, but implementing them often reveals unexpected challenges. It's usually more complicated in practice than it appears on paper.
This. I literally am asking for a step-by-step guide outlining every step (including an existing corpus that can be used on a consumer-grade laptop to train the model in under a week).
If the implementation details are clear, replicating the setup can be worthwhile. Sometimes seeing it in action helps to better understand the nuances.
Also because normal usage has predictable usage patterns, which allows them to optimise and predict costs. Flat rate pricing only makes sense in that regime.
Sure, until their "smart filters" start considering GCP-hosted websites as pre-verified and small self-hosted websites as malicious. You know, like they have been doing with email?
Chrome is big enough that a website owner can't afford a false positive on their malware list, just like they can't afford to have all their email end up in spam for all Gmail users.
Due to their near-monopoly Google also has no incentive to avoid adding false positives to their blocklist - provided they don't accidentally block high-profile targets. And if a CxO is screaming over your shoulder that your website has been blocked, arguments about "false positives" aren't very compelling: they'll just demand you move off the "shitty basement provider" and switch to "proper hosting, like the Google Cloud"...
Foundational:
- Pretraining - Mid/post-training (SFT) - RLHF or alignment post-training (RL)
And sometimes...
- Some more customer-specific fine-tuning.
Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely.
reply