More

generall · on Dec 23, 2024

IVF, unfortunately, is barely compatible with filtered search. It have to rely on post-filtering and retrieve more and more candidates if the result set is not big enough. If the query is in some form correlated with the filter, this approach quickly degrades to brute-force.

Surprised that the article doesn't mention filtering use-case at all.

VoVAllen · on Dec 24, 2024

Hi, I'm the author of the article. I actually think the opposite of what you mentioned. IVF is more suitable for prefiltering compared to HNSW. For prefiltering in HNSW, there is a certain requirement for the filtering rate—it can't be too low, or the entire graph may become disconnected. For instance, with the commonly used parameter m=16, each node can have at most 16 neighbors. If the filtering rate is below 5%, it can directly result in no neighbors meeting the condition, causing the algorithm to fail entirely. This is why the academic community has proposed alternatives like ACRON[1]. On the other hand, IVF doesn't have this problem at all—you can check whether a candidate meets the condition before calculating distances.

[1] https://arxiv.org/abs/2403.04871

generall · on Dec 24, 2024

In IVF you can start checking conditions only in the final bucket. There are no guarantees if the bucket has any acceptable value, and there are no procedures to find the bucket which has acceptable values before scanning it.

VoVAllen · on Dec 24, 2024

I'm not sure I fully understand your point. This can be implemented quite easily by organizing each posting list with specific attributes to locate the values efficiently. For example, you could build each posting list into a B-tree, or similar to a column-oriented format that stores attribute statistics to enable skip scanning. Could you elaborate more?

generall · on Dec 29, 2024

In this scenario posting list have to share all the same attributes as each element inside the list.

generall · on Dec 12, 2023

https://qdrant.tech/articles/dedicated-service/ - we have some arguments on this

generall · on Sept 28, 2023

This article contains a lot of inaccuracies.

Based on your statements, like

> Qdrant stores both the vectors and the metadata in a sqlite database.

It looks like you have benchmarked local mode of qdrant. It doesn't even use vector indexes and is not designed for any kind of production usage.

For anyone reading this article, I urge you to do your own benchmarks and not rely on claims that do not have open source code attached to them to replicate the results

jarulraj · on Sept 28, 2023

Hi Andrey. Thanks for your feedback. We should have better emphasized that we are benchmarking Qdrant in local mode. We have updated the post to clarify that Qdrant is being evaluated in local mode. We plan to next evaluate the server mode.

We went with the local mode as several Python AI apps are using Qdrant in that mode based on the suggestion here: https://qdrant.tech/documentation/quick-start/.

We also believe in open-sourced benchmark code. Please find the code here: https://github.com/jiashenC/vectordb-benchmark-and-optimize/....

generall · on Aug 16, 2023

With a few optimization tricks, TL;DR: - ONNX inference in Rust - Embeddings cache & lookup - Parallel & Batch requests - hybrid search with full-text filtering + vector re-scoring

generall · on Aug 8, 2023

I would say that text text and vector search are orthogonal. Some scenarios are better with one, others with combination. But fitting vector search into the interface designed for text is limiting vector search potential

DrScientist · on Aug 8, 2023

Interesting.

So how would you construct an interface specifically designed for a vector search?

generall · on March 7, 2023

> There are a few arguments why adding sparse search doesn't require too much extra specialization

Full-text search != sparse search, that's a naive oversimplification. Btw, sparse search is on Qdrant roadmap, so we should be able to compare it's performance on benchmarks.

> Cross Encoder inference generally doesn't happen in the database itself thus it makes sense to use modules to process the additional ranking logic

That statement makes your argument that `A combined system have better end-to-end latency.` invalid

> Such benchmarks exist as in trengrj's initial response and we are working on them as well.

link or it didn't happen.

In your current benchmarks you advertise everywhere, you're just throwing in disproportionately powerful and expensive hardware. Even a full-scan can give good results under those conditions

CShorten · on March 7, 2023

1. Please expand on how you are defining full text search distinctly from sparse search to continue the discussion. My understanding is that full-text search is built on inverted indexing which is the intended meaning behind "sparse" search -- whether that be BM25 sparse or SPLADE sparse or exact keyword / phrase matching, my understanding is that it is the same underlying index algorithm with distinctions in scoring for BM25 / SPLADE.

2. The original argument references "combined system" in the sense of Hybrid Search (BM25 + Dense Vector Search). I don't think this is a fair comparison -- model inference services are extremely lightweight relative to the sparse / dense indexing systems we are primarily discussing. I also have not advocated for Cross Encoder inference in the spirit of improving latency, just clarifying why a module system is used for it.

3. The hybrid search results from researchers independent of either of us are linked in the original comment, here it is again - https://arxiv.org/abs/2201.10582. Your criticism of the Weaviate ANN benchmarks isn't relevant to our discussion on Hybrid Search. I have linked this to show that Weaviate has produced comparative benchmarks which was your original claim. Although I do not agree with your premise that a full-scan search with give similar speed results to HNSW on this setup, or however arbitrarily we are defining "good results". I acknowledge that it is not included in the benchmark report and is something that should be added. I also agree that it would be interesting to run ANN recall tests on several hardware configurations.

generall · on March 7, 2023

> Please expand on how you are defining full text search distinctly from sparse search to continue the discussion

In addition to the indexing algorithm, there is the tokenizer, which depends on the language, lemmatizer, synonyms, stop-words, and so on and so forth. In addition, the ranking function itself may be quite different and based on different rules. See how Meilisearch does it. Reducing full-text search to just a reverse index is a misconception

> Your criticism of the Weaviate ANN benchmarks isn't relevant to our discussion on Hybrid Search.

It is very much relevant, as I mentioned, in parallel processes

total_latency = max(BM25_latency, Vector_search_latency) + merge overhead

and my claim is that in specialized tools both BM25_latency and Vector_search_latency will be better than what the multi-tool system can provide.

> I have linked this to show that Weaviate has produced comparative benchmarks which was your original claim.

I don't see any comparisons in your benchmarks here - https://weaviate.io/developers/weaviate/benchmarks/ann

You just benchmarked yourself, that is not interesting and not helping.

> I also agree that it would be interesting to run ANN recall tests on several hardware configurations.

That is not the point. In our benchmark we run all engines on exactly the same machine to make it fair. Sometimes same configuration in different regions already gives very different performance on some cloud providers.

CShorten · on March 7, 2023

1. Weaviate does all these things as well, please see - https://weaviate.io/blog/pulling-back-the-curtains-on-text2v.... Although there are interesting optimizations around fitting a tokenizer, these are all relatively simple things whereas the inverted index is more so the interesting thing where the optimizations of specialized systems come into play e.g. WAND scoring.

2. Following on #1 what optimizations does BM25 require that justify an entirely separate tool that requires maintenance of two separate search systems? Also helps that merge overhead to have both searches in the same system.

3. Any company's report of benchmarking itself against its competitors should be taken with a grain of salt.. this is obviously bad practice. The purpose of these benchmarks are to compare in this case hyperparameters of HNSW and in future works around the BEIR numbers I provided earlier, Hybrid Search performance.

4. Again, no company can seriously benchmark itself against its competitors due to obvious conflicts of interest. Maybe this competition could be hosted here - https://big-ann-benchmarks.com/index.html#organizers.

generall · on March 7, 2023

Benchmark is open source - https://github.com/qdrant/vector-db-benchmark You are welcome to fork it and make your own measurements, if you suspect something.

Benchmarks like https://big-ann-benchmarks.com/index.html#organizers are good for comparing algorithms, but not engines. They are focused on a single use scenario and do not cover variety of possible applications. Like, for example, how filtering affect the performance.

generall · on March 7, 2023

Proposed architecture doesn't limit you to use self-hosted transformers only, you can use OpenAI just as easily. And you don't need a to install a "module" for that

generall · on March 7, 2023

The latency of the combination of parallel systems is equal to the slowest component. And obviously, specialized tools will be faster than a component of a multi-tool system cause while dedicated engines can invest in optimizing specific functionality, multi-tool engines are stuck in the integration hell.

A solution assembled for a specific task from highly specialized components will always be more optimal than 'one-size-fits-all' pipelines. Meilisearch solves search-as-you-type better than anyone else, so why compromise? Not to mention that the scalability pattern of BM25 and vector search is entirely different.

This, by the way, is pretty obvious from the fact that you don't publish comparative benchmarks.

generall · on Sept 5, 2022

There is more relevant benchmark of vector search engines end-to-end, not just algorithms: https://qdrant.tech/benchmarks/

generall · on July 13, 2022

PML is a great collection of implementations, but not the best framework. Also you can use PML with Quaterion: https://github.com/qdrant/quaterion/blob/master/examples/tra...