One question is how much other factors really matter compared to the raw "intell...

modernpink · on May 11, 2023

Or rather, more analogously, a self-driving car that has a range of 10 000 miles but sometimes makes mistakes when driving vs a self-driving car with a range of 800 miles that never makes mistakes. Once you've have a taste of intelligence it's hard to give up.

However, in many applications there is a limit on how intelligent you need the LLM to be. I have found I am able to fall back to the cheaper and faster GPT-3.5 to do the grunt work of forming text blobs into structured json within a chain involving GPT-4 for higher-level functions.

majormajor · on May 11, 2023

Big question on that for me is that there's a variety of "completion styles" and I'm curious how "universal" performance on them is. Probably more than this, but a quick list that comes to mind:

* Text summary/compression

* Creative writing (fiction/lyrics/stylization)

* Text comparison

* Question-answering

* Logical reasoning/sequencing ("given these tools and this scenario, how would you perform this task")

IMO, for stuff like text comparison and question-answering, some combo of speed/cost/context-size could make up for a lot, even if they do "worse" versions of stuff just that's too slow or expensive or context-limited in a different model.

gmerc · on May 12, 2023

We already see how that goes. Stackable LoRa like in stable diffusion.

solarkraft · on May 11, 2023

I don't know. While using Phind I regularly get annoyed by long prose that doesnt answer anything (yes, "concise" is always on). Claude seems to be directly geared towards solving stuff over nice writing.

Tostino · on May 11, 2023

I generally add to my initial prompts to GPT4 to: From now on, please use the fewest tokens possible in all replies to save tokens and provide brief and accurate answers.

tikkun · on May 11, 2023

Strongly agree. They are ordered by how much I think they generally will lead to users choosing one model over the other.

Intelligence is the most important dimension by far, perhaps an order of magnitude or more above the second item on the list.

danenania · on May 11, 2023

On that note, can anyone speak to how Anthropic (or other models) are doing on catching up to OpenAI for pure model intelligence/quality of completions? Are any others approaching GPT-4? I've only used GPT-based tools so I have no idea.

famouswaffles · on May 11, 2023

The best claude model is closer to GPT-4 than 3.5