Benchmarks

These benchmarks show TopK's end-to-end query performance for nearest-neighbor search across different collection sizes and selectivity levels.

The metrics include median (p50), 95th percentile (p95), and 99th percentile (p99) latencies in milliseconds, as well as overall throughput in queries per second (QPS).

Selectivity refers to what fraction of the collection is scanned - from a full scan (100%) down to scanning just 1% of vectors.

Lower selectivity generally yields better performance but requires effective filtering or indexing strategies.

1M Document Collection

1M Documents, 768 Dimensions, k=10
Reducing scan size from 100% to 1% improves median latency by ~28% (6.8ms → 4.9ms) and throughput by ~17% (54 → 63 QPS)
Selectivityp50 (ms)p95 (ms)p99 (ms)QPS
100% (Full Scan)6.89.11354
10% (100K docs)5.257.31061
1% (10K docs)4.96.7963
012345678910111213Latency (ms)100%10%1%Selectivity (%)
p50 (ms)
p95 (ms)
p99 (ms)

10M Document Collection

10M Documents, 768 Dimensions, k=10
Selectivityp50 (ms)p95 (ms)p99 (ms)QPS
100% (Full Scan)16.5202536.5
10% (1M docs)11151946
1% (100K docs)9.5131649
024681012141618202224Latency (ms)100%10%1%Selectivity (%)
p50 (ms)
p95 (ms)
p99 (ms)

100M Document Collection

100M Documents, 768 Dimensions, k=10
Selectivityp50 (ms)p95 (ms)p99 (ms)QPS
100% (Full Scan)24344127
10% (10M docs)18.5293434
1% (1M docs)17273336
0510152025303540Latency (ms)100%10%1%Selectivity (%)
p50 (ms)
p95 (ms)
p99 (ms)

Latency Scaling

Summary by Collection Size (100% Selectivity)
Shows how performance scales with collection size when performing full scans
Collection Sizep50 (ms)p95 (ms)p99 (ms)QPS
1M6.89.11354
10M16.5202536.5
100M24344127
0510152025303540Latency (ms)1M10M100MCollection size (docs)
p50 (ms)
p95 (ms)
p99 (ms)

QPS vs Concurrency

Queries per second (QPS) achieved with different number of concurrent clients and single query path replica.

020406080100120140160180200220QPS (QPS)124816Concurrent clients
1% Selectivity
10% Selectivity
100% Selectivity

Achieving higher QPS can be easily achieved by provisioning more query path replicas.