Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One question is how much other factors really matter compared to the raw "intelligence" of the model--how good its completions are. You're not going to care very much about context window, prompting, or integrations if the output isn't good. It would be sort of like a car that has the best steering and brakes on the market, but can't go above 5 mph.


Or rather, more analogously, a self-driving car that has a range of 10 000 miles but sometimes makes mistakes when driving vs a self-driving car with a range of 800 miles that never makes mistakes. Once you've have a taste of intelligence it's hard to give up.

However, in many applications there is a limit on how intelligent you need the LLM to be. I have found I am able to fall back to the cheaper and faster GPT-3.5 to do the grunt work of forming text blobs into structured json within a chain involving GPT-4 for higher-level functions.


Big question on that for me is that there's a variety of "completion styles" and I'm curious how "universal" performance on them is. Probably more than this, but a quick list that comes to mind:

* Text summary/compression

* Creative writing (fiction/lyrics/stylization)

* Text comparison

* Question-answering

* Logical reasoning/sequencing ("given these tools and this scenario, how would you perform this task")

IMO, for stuff like text comparison and question-answering, some combo of speed/cost/context-size could make up for a lot, even if they do "worse" versions of stuff just that's too slow or expensive or context-limited in a different model.


We already see how that goes. Stackable LoRa like in stable diffusion.


I don't know. While using Phind I regularly get annoyed by long prose that doesnt answer anything (yes, "concise" is always on). Claude seems to be directly geared towards solving stuff over nice writing.


I generally add to my initial prompts to GPT4 to: From now on, please use the fewest tokens possible in all replies to save tokens and provide brief and accurate answers.


Strongly agree. They are ordered by how much I think they generally will lead to users choosing one model over the other.

Intelligence is the most important dimension by far, perhaps an order of magnitude or more above the second item on the list.


On that note, can anyone speak to how Anthropic (or other models) are doing on catching up to OpenAI for pure model intelligence/quality of completions? Are any others approaching GPT-4? I've only used GPT-based tools so I have no idea.


The best claude model is closer to GPT-4 than 3.5




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: