Beyond RRF: How TopK Improves Hybrid Search by up to 7.8%

Marek Galovic

July 24, 2025

Modern information retrieval systems increasingly combine sparse and dense representations to balance lexical precision with semantic generalization. Traditionally, hybrid retrieval pipelines fetch partial results from multiple indices (e.g., dense embeddings and keyword-based models) and merge them using rank aggregation methods such as Reciprocal Rank Fusion (RRF). While effective, these pipelines often fail to fully leverage the scoring signals of individual retrievers and introduce ranking inconsistencies.

This case study explores using TopK true hybrid retrieval capabilities to improve results quality over rank-fusion approaches. We benchmark four retrieval configurations across several datasets from the BEIR benchmark suite and observe that the TopK hybrid retrieval consistently improves nDCG@10 by up to 7.8% over traditional rank-fusion methods.

Background

Retrieval quality is a critical determinant of downstream application performance in search, recommendation, and question answering systems. Sparse retrieval, powered by models like SPLADE, excels at matching exact terms and handling structured queries, while dense retrieval models capture semantic similarity even when query and document vocabularies diverge.

However, neither paradigm is universally dominant:

Sparse retrievers falter on paraphrased or conceptually rich queries.
Dense retrievers often overlook rare terms or domain-specific keywords.

To address this, hybrid systems aggregate results from both models. The most common method is Reciprocal Rank Fusion (RRF), which normalizes ranks from each retriever and combines them into a unified ranking. While RRF is simple and effective, it ignores raw score magnitudes, applies uniform fusion weights, and often limits candidate lists to partial top-k results from each retriever. This can suppress relevant documents ranked moderately by both systems but overlooked by either individually.

TopK Hybrid Search

Our approach leverages TopK's hybrid retrieval capabilities to provide direct, score-aware ranking across multiple retrieval methods. Instead of truncating to partial result sets and applying rank-only fusion, TopK:

Scores and normalizes results directly from each retriever (dense and sparse), respecting the magnitude of relevance scores rather than solely their ranks.
Applies a tunable custom scoring function that weights dense vs. sparse contributions dynamically (e.g., emphasizing sparse scores when exact term matches are present, and dense scores otherwise).
Merges candidates globally rather than pre-truncating, ensuring that documents moderately ranked by both retrievers are surfaced if their combined score is competitive.
Selects the final top-k results (e.g., top 10 or 100) directly, minimizing recall loss from early-stage truncation.

This approach effectively removes a key bottleneck in hybrid search pipelines: the disconnect between partial recall from individual retrievers and the final relevance ordering.

Here is how the query looks like in TopK SDK:

from topk_sdk.data import f32_vector, f32_sparse_vector
from topk_sdk.query import select, field, fn

collection.query(
    select(
        # Dense vector score
        dense_score=fn.vector_distance("dense", f32_vector([...])),
        # Sparse vector score
        sparse_score=fn.vector_distance("sparse", f32_sparse_vector({...}))
    )
    .topk(
        # Merge dense and sparse scores
        0.7 * field("dense_score") + 0.3 * (field("sparse_score") / 100.0),
        # Select top-10 results
        10
    )
)

Experiments

We evaluated four configurations across multiple datasets from the BEIR benchmark:

Dense-only retrieval using ModernBERT-base.
Sparse-only retrieval using SPLADE-v3.
Traditional hybrid retrieval using Reciprocal Rank Fusion (RRF).
Hybrid retrieval with TopK, employing a custom scoring function: alpha * dense_score + (1 - alpha) * sparse_score.

We used nDCG@10 as the primary metric, reflecting both relevance and ranking position with top_k = 10 results per query. By incorporating scores from both dense and sparse models inside a single query, we achieved an average improvement of 4.58% over RRF-based hybrid systems.

Dataset	Dense-only	Sparse-only	RRF	TopK Hybrid	Improvement
FiQA	0.40661	0.38023	0.4123	0.42853	3.94%
TREC-COVID	0.81431	0.66741	0.76779	0.82798	7.84%
NQ	0.52029	0.51405	0.53885	0.55271	2.57%
NFCorpus	0.32458	0.35837	0.34593	0.36803	6.39%
FEVER	0.85213	0.79154	0.84643	0.86464	2.15%
Average	0.583584	0.54232	0.58226	0.608378	4.58%

In practice, people often overfetch k' > k results from individual retrievers and then apply RRF. While this is a valid approach to improve results quality, it often leads to slower queries and higher resource usage. For the sake of completeness, we evaluated RRF with 100 candidates per retriever to get the final top-10 results.

Dataset	RRF (10 candidates)	RRF (100 candidates)	TopK Hybrid	Improvement
FiQA	0.4123	0.41458	0.42853	3.36%
TREC-COVID	0.76779	0.80907	0.82798	2.34%
NQ	0.53885	0.54093	0.55271	2.18%
NFCorpus	0.34593	0.35027	0.36803	5.07%
FEVER	0.84643	0.84316	0.86464	2.55%
Average	0.58226	0.591602	0.608378	3.10%

As the table above shows, RRF with 100 candidates per retriever improves the overall results quality but TopK's hybrid retrieval still outperforms it by 3.10% on average while being more efficient at the same time.

Summary

Our evaluation demonstrates that TopK hybrid retrieval consistently improves results relevance across multiple datasets, achieving a 4.5% average increase in nDCG@10 over reciprocal rank fusion. By directly integrating normalized scores from dense and sparse retrievers, applying tunable weightings, and selecting the final top-k results without intermediate truncation, TopK mitigates recall loss and ranking inconsistencies inherent to partial list aggregation. These results underscore TopK's value as a more principled and efficient alternative to conventional hybrid search pipelines. If you want to learn more about TopK's hybrid search capabilities, check out our documentation.

If you are interested in building high-quality search infrastructure, shoot me an email at marek@topk.io. We’re hiring!

Beyond RRF: How TopK Improves Hybrid Search by up to 7.8%

Background

TopK Hybrid Search

Experiments

Summary

References