> One solution is to ensemble semantic search with keyword search. BM25 is a solid baseline when we expect at least one keyword to match. Nonetheless, it doesn’t do as well on shorter queries where there’s no keyword overlap with the relevant documents—in this case, averaged keyword embeddings may perform better. By combining the best of keyword search and semantic search, we can improve recall for various types of queries.
For it I wrote a custom search engine, and KNN index implementation, which ranks and merges results across three stages (labels, full-text, embedding) effectively. To speed up retrieval, OpenAI embeddings are stored instead as SuperBit signatures. Rank merging turned out to be a really hard problem.
I think something akin to a mashup between Engleberts augmentation, Nelson's Xanadu (r) and Bucky's tensegrity system would make a great accompanying knowledge management system to manage branching conversations with AI, after a while handling the content generated becomes a task in itself. Visualising the created data would be ace.
I only know tensegrity from the structural engineering concept, and although I'm not on a nickname level of familiarity with Buckmimster Fuller, I'm still confused as to how it applies here.
tensegrity structures are afaik natures most stable yet diverse systems, their strength combined with flexibility gave me the notion that combining data storage and vector databases would benefit from a strut that can have properties, for example tension and strength, then, you can feed semantic information to the struts, and the emerging structure could be mapped into less dimensions to be visualised in 3d, or something like that :)
the aim is to combine my 'truths', eg belief systems, Xanadu (r), irrefutable measurable facts, think wolfram, creative multimedia content, think TikTok meets twitter, machine learning, sentiment and content analysis with something like GPT to function as an advanced mind mapping tool wherein I can explore ideas pulling from all these 'experts' into a coherent chain of information that can be traversed and branched in a q and a style to extend the system again, or something like that...
"Most (if not all) embedding-based retrieval use approximate nearest neighbours (ANN). If we use exact nearest neighbours, we would get perfect recall of 1.0 but with higher latency (think seconds). In contrast, ANN offers good-enough recall (~0.95) with millisecond latency. I’ve previously compared several open-source ANNs and most achieved ~0.95 recall at throughput of hundreds to thousands of queries per second."
Can the results of multiple very fast, approximate queries somehow be used to get the equivalent of one very slow, reliable query?
Might be useful to have an LLM generate a few versions of the query in order to account for imperfect recall — it seems like something gpt-3.5-turbo and friends would be pretty good at doing.
I've been thinking about using BM25 as a retrieval method to enhance LLMs, I'm happy to see it mentioned here. It can complement vector search, but if it turns out to be useful on its own (maybe with query expansion and other tricks), it could be used as an alternative to vector search when running locally, or in environments with lower compute resources.
Oh hey I have a demo of that here: https://findsight.ai
For it I wrote a custom search engine, and KNN index implementation, which ranks and merges results across three stages (labels, full-text, embedding) effectively. To speed up retrieval, OpenAI embeddings are stored instead as SuperBit signatures. Rank merging turned out to be a really hard problem.