TopK SQL: A Search Query Language

Postgres is the lingua franca of databases. Decades of tools, drivers, and workflows are built around its wire protocol: psql, psycopg2, node-postgres, tokio-postgres, ORMs, dashboards, and more. They all expect a Postgres-shaped server on the other end. Until now, TopK was available only through our Python, JavaScript, and Rust SDKs.

Now it speaks Postgres too. Any client that talks to Postgres can connect to TopK and get state-of-the-art search quality as ordinary SQL:

SELECT 
    title,
    semantic_similarity(bio, 'an epic fantasy quest') AS score
FROM books
WHERE match_any(bio, 'dragon wizard')
ORDER BY boost(score, published_year > 2010, 1.5) DESC
LIMIT 10

Semantic search, keyword filtering, and metadata-aware ranking in one query, over a standard Postgres connection.

TopK SQL Specification

TopK SQL is a search-oriented dialect: Postgres-shaped where that helps, extended where search needs more. It supports schemaless tables with vector, sparse, and multi-vector types; SELECT with semantic, keyword, vector, and hybrid scoring; standard WHERE predicates plus search filters; INSERT / UPDATE / DELETE; and a Postgres-compatible wire protocol. For the full language reference, see the TopK SQL overview.

1.1 Schema(less)

Schemaless by default. A table has no fixed schema: rows can contain undeclared fields, a column can hold values of different types, and undeclared fields remain queryable and filterable. Declare a field when you want to index it or constrain its type.

CREATE TABLE defines the table and declared columns — indexes are declared inline on each column. DROP TABLE removes the table.

CREATE TABLE books (
    title          TEXT,
    published_year INTEGER,
    bio            TEXT             INDEX semantic_index(),
    embedding      f32_vector(768)  INDEX vector_index(metric = 'cosine')
);

Declared columns are not the full document. Rows can include fields that never appeared in CREATE TABLE:

INSERT INTO books (_id, title, rating, tags)
VALUES ('earthsea', 'A Wizard of Earthsea', 4.8, ARRAY['fantasy', 'magic']);

SELECT title, rating
FROM books
WHERE contains(tags, 'magic');

rating and tags were never declared; they are still stored, returned, and filterable.

1.2 Types

Standard Postgres scalar types (BOOLEAN, INT, FLOAT, TEXT, BYTEA, and JSONB) are supported, along with typed arrays such as BOOLEAN[], INT[], FLOAT[], and TEXT[]. We extend the type system with native support for the following vector and matrix shapes:

Shape	Type	Precisions
Dense	`*_vector(dim)`	`f32` `f16` `f8` `u8` `i8`
Sparse	`*_sparse_vector`	`f32` `f16` `f8` `u8` `i8`
Multi-vector	`*_matrix(dim)`	`f32` `f16` `f8` `u8` `i8`
Binary	`binary_vector(dim)`	1-bit

Vector values can be constructed with a JSON-string cast ('[...]'::f32_vector) or a constructor (f32_vector(ARRAY[...])).

1.3 `SELECT` queries

Search queries in TopK SQL are SELECT statements built around scores. Semantic search, vector search, BM25, and multi-vector retrieval each produce scores that can be selected, aliased, combined, boosted, and sorted.

SELECT
    _id, title,
    semantic_similarity(bio, 'tales of magic and adventure') AS score
FROM books
ORDER BY score DESC
LIMIT 10;

The basic shape is: compute a relevance score, sort by it, and return the top results. From there, search and ranking can be tuned through composition: filter in WHERE, expose multiple scores in SELECT, and combine them in ORDER BY:

SELECT _id, title,
    bm25_score() AS keyword_score,
    semantic_similarity(bio, 'an epic fantasy quest') AS semantic_score,
    vector_distance(embedding, '[...]'::f32_vector) AS vector_score
FROM books
WHERE match_any(bio, 'dragon wizard')
  AND published_year > 1950
ORDER BY
    0.2 * keyword_score +
    0.5 * semantic_score +
    0.3 * vector_score DESC
LIMIT 10;

This is hybrid search without multiple queries, client-side fusion, or reciprocal-rank fusion. Keyword matching contributes lexical relevance, semantic similarity contributes meaning, the vector score contributes similarity against your own embeddings, and the final ranking is a single SQL expression.

1.3.1 Search functions

TopK SQL exposes retrieval modes as scoring functions. Each function targets an index and returns a score. See the TopK SQL overview for the complete list of search functions and index types.

Function	Index	Use it for
`semantic_similarity(field, query)`	`semantic_index`	query embedding, candidate generation, and reranking with Iso-ModernColBERT
`vector_distance(field, vector)`	`vector_index`	dense or sparse ANN against client-supplied vectors
`multi_vector_distance(field, matrix)`	`multi_vector_index`	late-interaction MaxSim retrieval
`bm25_score()`	`keyword_index`	keyword relevance from `match_any(...)` / `match_all(...)` predicates

1.3.2 Filtering

Filters narrow the candidate set before ranking. TopK SQL supports standard predicates — comparisons, membership, text predicates, and regex — plus search-specific predicates.

WHERE published_year > 2000
  AND in_print = true
  AND genre IN ('fantasy', 'fiction')
  AND match_any(bio, 'dragon wizard')

Those predicates can be ordinary metadata filters, text search predicates, regexes, list checks, or keyword predicates (such as match_any()).

1.3.3 Scoring

Scores are ordinary values. Alias them in SELECT, then combine them with arithmetic or ranking functions in ORDER BY.

The composition example above adds keyword, semantic, and vector scores directly. You can also fold metadata into the ranking expression:

ORDER BY boost(semantic_score, published_year > 2010, 1.5) DESC

Ranking stays inside the query: retrieval scores and metadata signals combine in one ordered expression instead of being merged in application code.

1.4 `INSERT` / `UPDATE` / `DELETE`

TopK SQL supports the same write operations you expect from Postgres. INSERT writes a full document, including undeclared fields, and has upsert semantics: inserting an existing _id replaces the document.

INSERT INTO books (_id, title, published_year, tags)
VALUES ('hobbit', 'The Hobbit', 1937, ARRAY['fantasy', 'adventure']);

UPDATE changes fields on documents identified by _id — either _id = '...' or _id IN (...):

UPDATE books SET in_print = true WHERE _id = 'hobbit';

DELETE removes the documents matched by a filter:

DELETE FROM books WHERE published_year < 1900;

Unlike UPDATE, DELETE accepts the same filter expressions as SELECT, so you can delete by ID or by any predicate.

1.5 Protocol

The SQL layer speaks the Postgres wire protocol, so standard clients can connect without custom adapters. psql, application drivers, prepared statements, and dashboard tools can all use the same endpoint.

TopK implements both simple and extended query modes. Connecting to TopK SQL requires only an API key in the connection password field — see the TopK SQL overview for connection setup.

psql "host=elastica.sql.topk.io password=<api-key>"

The host must be set to your desired region, in the format <region>.sql.topk.io. See the full list of supported regions at docs.topk.io/regions.

1.5.1 Type resolution

One part of the Postgres protocol does matter for a schemaless database: column types are sent before rows. In Postgres this is natural because every selected column has a known type. In TopK, undeclared fields may not.

topk-sql tries to infer the type of each SELECT column in order:

SELECT column
   ├─ explicit cast?        ──►  Postgres OID (::int4, ::float8, ::text)
   ├─ declared column?      ──►  type inferred from schema
   └─ unknown/mixed type?   ──►  JSON

Standard Postgres drivers deserialize JSON values into native maps and lists. Use an explicit :: cast when you want a concrete wire type.

1.5.2 Table catalog

Existing tables and their schemas can be inspected through information_schema.tables and information_schema.columns virtual tables. EXPLAIN returns the TopK query produced by the SQL parser, so you can see what will run before executing it.

1.6 Implementation

The parser, topk-sql, is open source in github.com/topk-io/topk, alongside topk-py, topk-js, and topk-rs. Like the SDKs, it is a thin mapping over the engine: it parses Postgres-flavored SQL into a TopK query rather than implementing a separate query planner, so it provides the same semantics and benefits from all optimizations we make to our planning and execution pipeline.

Pick your interface

SQL is a thin wrapper around TopK, not a second implementation of it. It's one more way in, next to the SDKs, mapping onto the same engine. The query you'd write in Python and the same query in SQL resolve to the same plan.

Reach for whichever fits where you're working: a notebook, a service, a dashboard, a psql prompt, or any JDBC-compatible tool. That means TopK can plug into the existing SQL ecosystem — from BI dashboards to federated query engines and warehouses.

Start today at console.topk.io, read the TopK SQL overview, or browse the parser source at github.com/topk-io/topk.