Benchmarks

These benchmarks show TopK's end-to-end query performance for hybrid vector search across different collection sizes and filter selectivity levels.

The metrics include median (p50), 95th percentile (p95), and 99th percentile (p99) latencies in milliseconds, as well as overall throughput in queries per second (QPS).

Selectivity refers to what fraction of the collection is scanned - from a full scan (100%) down to scanning just 1% of vectors. Lower selectivity generally yields better performance but requires effective filtering or indexing strategies.

1M Document Collection

Dense Vector Search

1M Documents, 768 Dimensions, k=10
Reducing scan size from 100% to 1% improves median latency by ~28% (6.8ms → 4.9ms) and throughput by ~17% (54 → 63 QPS)
Selectivityp50 (ms)p95 (ms)p99 (ms)QPS
100% (Full Scan)6.89.11354
10% (100K docs)5.257.31061
1% (10K docs)4.96.7963
012345678910111213Latency (ms)100%10%1%Selectivity (%)
p50 (ms)
p95 (ms)
p99 (ms)

Sparse Vector Search

1M Documents, document non-zero dims=512, query non-zero dims=32, k=10
Selectivityp50 (ms)p95 (ms)p99 (ms)QPS
100%2.782.963.20223
10%2.702.873.10181
1%2.612.873.01182
0.00.51.01.52.02.53.03.5Latency (ms)100%10%1%Selectivity (%)
p50 (ms)
p95 (ms)
p99 (ms)

10M Document Collection

Dense Vector Search

10M Documents, 768 Dimensions, k=10
Selectivityp50 (ms)p95 (ms)p99 (ms)QPS
100% (Full Scan)16.5202536.5
10% (1M docs)11151946
1% (100K docs)9.5131649
02468101214161820222426Latency (ms)100%10%1%Selectivity (%)
p50 (ms)
p95 (ms)
p99 (ms)

Sparse Vector Search

10M Documents, document non-zero dims=512, query non-zero dims=32, k=10
Selectivityp50 (ms)p95 (ms)p99 (ms)QPS
100%12.913.515.063.0
10%11.7712.914.668.5
1%11.0612.913.370.0
0246810121416Latency (ms)100%10%1%Selectivity (%)
p50 (ms)
p95 (ms)
p99 (ms)

100M Document Collection

Dense Vector Search

100M Documents, 768 Dimensions, k=10
Selectivityp50 (ms)p95 (ms)p99 (ms)QPS
100% (Full Scan)24344127
10% (10M docs)18.5293434
1% (1M docs)17273336
051015202530354045Latency (ms)100%10%1%Selectivity (%)
p50 (ms)
p95 (ms)
p99 (ms)

Sparse Vector Search

100M Documents, document non-zero dims=512, query non-zero dims=32, k=10
Selectivityp50 (ms)p95 (ms)p99 (ms)QPS
100%13.7414.8515.856.7
10%12.3314.8515.5563
1%11.9514.1715.567
0246810121416Latency (ms)100%10%1%Selectivity (%)
p50 (ms)
p95 (ms)
p99 (ms)

Latency Scaling - Dense

Summary by Collection Size (100% Selectivity)
Shows how query performance scales with collection size using a single query path replica and single client.
Collection Sizep50 (ms)p95 (ms)p99 (ms)QPS
1M6.89.11354
10M16.5202536.5
100M24344127
051015202530354045Latency (ms)1M10M100MCollection size (docs)
p50 (ms)
p95 (ms)
p99 (ms)

Latency Scaling - Sparse

Summary by Collection Size (100% Selectivity)
Shows how query performance scales with collection size using a single query path replica and single client.
Collection Sizep50 (ms)p95 (ms)p99 (ms)QPS
1M2.782.963.2223
10M12.913.51563
100M13.7414.8515.856.7
0246810121416Latency (ms)1M10M100MCollection size (docs)
p50 (ms)
p95 (ms)
p99 (ms)

QPS vs Concurrency

Queries per second (QPS) achieved with different number of concurrent clients and single query path replica.

020406080100120140160180200220QPS (QPS)124816Concurrent clients
1% Selectivity
10% Selectivity
100% Selectivity

Achieving higher QPS can be easily achieved by provisioning more query path replicas.