[Podcast] Building a Search Database from First Principles

Marek Galovic

September 7, 2025

Apple Podcasts

Spotify

In this episode

Modern search is broken. There are too many pieces that are glued together.

Vector databases for semantic search
Text engines for keywords
Rerankers to fix the results
LLMs to understand queries
Metadata filters for precision

Each piece works well alone. Together, they often become a mess.

When you glue these systems together, you create:

Data Consistency Gaps - Your vector store knows about documents your text engine doesn't. Which is right?
Timing Mismatches - New content appears in one system before another. Users see different results depending on which path their query takes.
Complexity Explosion - Every new component doubles your integration points. Three components means three connections. Five means ten.
Performance Bottlenecks - Each hop between systems adds latency. A 200ms search becomes 800ms after passing through four components.
Brittle Chains - When one system fails, your entire search breaks. More pieces mean more breaking points.

I recently built a system where we had query specific post-filters but the requirement to deliver a fixed number of results to the user. A lot of times, the query had to be run multiple times to achieve the desired amount. So we had an unpredictable latency, a high load on the backend where some queries hammered the database 10+ times, and a relevance cliff where results 1-6 look great, but the later ones were poor matches.

Today on How AI Is Built, we are talking to Marek Galovic from TopK about how they built a new search database with modern components. "How would search work if we built it today?"

Cloud storage is cheap. Compute is fast. Memory is plentiful. One system that handles vectors, text, and filters together - not three systems duct-taped into one.

One pass handles everything: Vector search + Text search + Filters → Single sorted result

Built with hand-optimized Rust kernels for both x86 and ARM, the system scales to 100M documents with 200ms P99 latency. The goal is to do search in 5 lines of code.

Why This Matters

The current approach to building search systems is fundamentally flawed. By treating each component as a separate system and gluing them together, we've created a fragile, complex, and slow architecture that fails under real-world conditions.

This podcast episode explores a different approach: building search as a unified system from the ground up. Instead of duct-taping together vector databases, text engines, rerankers, and filters, TopK handles everything in a single pass with consistent data, predictable performance, and simple integration.

Whether you're dealing with the complexity of multi-system search architectures, struggling with performance bottlenecks, or simply curious about how search could work better, this episode provides valuable insights into rethinking search infrastructure from first principles.

Learn More

Explore our documentation to see how TopK works
Check out our benchmarks to see performance comparisons
Read our blog for deep technical insights

Interested in building the future of search? We're hiring! Check out our careers page for open positions.