About us
Created to solve search.
We are a team of engineers from other companies that were trying to solve the challenges with search. However, oftentimes we have seen these companies overloaded with the amount of buy-ins one required to ship. Other times we saw an opportunity to do things better. Thus, we decided to leave our comfortable jobs and start TopK.
Why vector-based retrieval isn't enough?
Poor out-of-domain generalization
Off-the-shelf embedding models are trained on the public internet but then used for private data inside companies.
The embedding manifold is not smooth and fails to capture semantic relationships between documents and queries.
Fine-tuning of embeddings is needed to get good performance
Exact information retrieval
Some use cases require exact information to be retrieved, for example, stock tickers, SKUs, case numbers, etc.
Fundamentally, vector-based retrieval cannot solve this so people try to combine it with keyword-based retrieval (hybrid RRF)
Filtering
The distribution of vectors and metadata values used for filtering is often uncorrelated.
This causes issues for vector indexes because we have to scan a large part of the index leading to lower efficiency.