Hacker Newsnew | past | comments | ask | show | jobs | submit | simsla's commentslogin

Typical stages of training for these models are:

Foundational:

- Pretraining - Mid/post-training (SFT) - RLHF or alignment post-training (RL)

And sometimes...

- Some more customer-specific fine-tuning.

Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely.


I think my experience as an interviewer has helped. If you ask non-leading questions, sycophancy doesn't come into play as much.

Instead of saying "are you sure?" or "shouldn't we do X instead?" you could say "give me the benefits and drawbacks of this compared to X".

Also, when you yourself are sure, give clear stear. "This overcomplicates A, let's do B instead."


For reference, I knew a guy who built bespoke cheats (less likely to get caught by ban waves) and he charged a few thousand per project.

I've never experienced this, but I guess I always respond with something like "No, [critique/steer]" or "Mostly fine, but [critique/steer]".

Choose to enable backups.


The blog post literally explains how to do so.


It's true, the post lays out the details clearly, but a hands-on example can often make the concepts more tangible. Seeing it in action helps solidify understanding.


The post lays out the steps clearly, but implementing them often reveals unexpected challenges. It's usually more complicated in practice than it appears on paper.


This. I literally am asking for a step-by-step guide outlining every step (including an existing corpus that can be used on a consumer-grade laptop to train the model in under a week).


If the implementation details are clear, replicating the setup can be worthwhile. Sometimes seeing it in action helps to better understand the nuances.


Also because normal usage has predictable usage patterns, which allows them to optimise and predict costs. Flat rate pricing only makes sense in that regime.


While I agree, if you need high profits to survive, you're not off to a great start as a nonprofit.


There is a financial incentive to make the search results worse. (More searches, more ads, more money.)

There is no incentive for adding false positives to lists of malicious websites.


Sure, until their "smart filters" start considering GCP-hosted websites as pre-verified and small self-hosted websites as malicious. You know, like they have been doing with email?

Chrome is big enough that a website owner can't afford a false positive on their malware list, just like they can't afford to have all their email end up in spam for all Gmail users.

Due to their near-monopoly Google also has no incentive to avoid adding false positives to their blocklist - provided they don't accidentally block high-profile targets. And if a CxO is screaming over your shoulder that your website has been blocked, arguments about "false positives" aren't very compelling: they'll just demand you move off the "shitty basement provider" and switch to "proper hosting, like the Google Cloud"...


Permissions scoping


Then they attempt to download the missing tool or write a substitute from scratch. Am I the only one who runs into this??


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: