Hybrid Search

Process retrieval published

Also known as: Hybrid Retrieval, Sparse-Dense Retrieval

Definition

A retrieval approach that combines multiple search strategies—typically keyword-based (sparse/BM25) and embedding-based (dense/vector)—to get the benefits of both. Keyword search excels at exact matches and rare terms; vector search excels at semantic similarity. Hybrid search merges their results, often using reciprocal rank fusion (RRF) or learned combination.

What this is NOT

  • Not just vector search (hybrid specifically combines multiple strategies)
  • Not ensemble of multiple vector indexes (hybrid combines sparse and dense)
  • Not reranking (reranking is a second stage; hybrid is parallel retrieval)

Alternative Interpretations

Different communities use this term differently:

llm-practitioners

Running both BM25/keyword search and vector similarity search, then combining results using a fusion algorithm. Most vector databases now support hybrid search natively.

Sources: Weaviate hybrid search documentation, Pinecone hybrid search, Qdrant hybrid retrieval

information-retrieval

Combining sparse retrieval (inverted index, TF-IDF, BM25) with dense retrieval (learned embeddings) to improve recall and precision across diverse query types.

Sources: Dense Passage Retrieval paper, Hybrid retrieval benchmarks

Examples

  • Weaviate query with alpha=0.5 balancing BM25 and vector scores
  • Elasticsearch combining kNN search with full-text search
  • Pinecone with sparse-dense vectors in a single query
  • RRF fusion of Elasticsearch and Qdrant results

Counterexamples

Things that might seem like Hybrid Search but are not:

  • Pure vector similarity search
  • Pure BM25/keyword search
  • Querying multiple vector indexes (that's ensemble, not hybrid)

Relations

Implementations

Tools and frameworks that implement this concept: