Home Tech & ScienceArtificial Intelligence (AI)Implementing Hybrid Semantic-Lexical Search in RAG

Implementing Hybrid Semantic-Lexical Search in RAG

by Delarno
0 comments
Implementing Hybrid Semantic-Lexical Search in RAG


In this article, you will learn how to implement a hybrid search strategy for RAG systems by combining BM25 lexical search with semantic search, fused together using Reciprocal Rank Fusion.

Topics we will cover include:

  • Why hybrid search outperforms either lexical or semantic search alone in retrieval-augmented generation systems.
  • How to implement BM25 lexical search and dense vector semantic search as independent retrieval engines in Python.
  • How to merge both rankings using Reciprocal Rank Fusion (RRF) to produce a final, balanced retrieval result.

Let’s get straight to it.

Implementing Hybrid Semantic-Lexical Search in RAG

Implementing Hybrid Semantic-Lexical Search in RAG

Introduction

Implementing hybrid search strategies is a critical step in building modern RAG (Retrieval-Augmented Generation) systems, especially when shifting from prototype to production-ready solutions.

There is little argument against semantic search — fueled by dense vectors or embeddings, which are numerical representations of text — being incredibly useful at understanding semantics, synonyms, and context. However, lexical, keyword-based search with approaches like BM25 covers a small blind spot neglected by semantic search. Combining the best of both worlds is therefore the perfect recipe to take your RAG system’s retrieval mechanism the extra mile.

Let’s explore how to implement such a hybrid search strategy through a gentle coding example, guiding you through every step of the process!

Note: If you are unfamiliar with RAG systems, you may find the “Understanding RAG” article series remarkably insightful for getting the most out of this read. In particular, I recommend acquiring an understanding of vector databases first through this article.

Step-by-Step Implementation

The first step is to ensure all the necessary external Python libraries are installed, in particular these three:

  • rank_bm25: an implementation of the BM25 lexical search algorithm for information retrieval (BM stands for “Best Matching”).
  • sentence-transformers: provides pre-trained language models for generating text embeddings. In a real setting, you may already have your own vector database containing many document embeddings and not need this, but we will use it here to simulate the construction of a toy vector database and illustrate hybrid search on it.
  • requests: used to fetch the raw dataset package from a public GitHub datasets repository prepared for this example.

With these ingredients at hand, we start by loading the dataset and storing the raw texts in a list (we do so because it is a small dataset).

The hybrid search process is divided into three stages: two of them take place in parallel, or independently from each other. The third is where the fusion of both approaches happens, using a merging method called Reciprocal Rank Fusion (RRF).

Let’s cover lexical search with BM25 first:

The lexical search process has been encapsulated in a function called search_bm25(). This function takes two input arguments: a string containing the user’s query to the RAG system, and the number of top results to retrieve. The rank_bm25 library provides a get_scores() method that computes, for each document — treated as a collection of tokens — a lexical relevance score. We then rank documents by decreasing score, select the top-k, and return them.

Meanwhile, the semantic search engine first uses a sentence transformer model to obtain embedding vectors for the texts and the user query, then applies a vector similarity metric like cosine similarity to rank texts by semantic relevance and retrieve the most relevant k:

Time to put it all together. The two scores calculated for each document cannot simply be added, because they operate on very different numeric scales. Instead, we perform the fusion based on ranks rather than raw similarity or relevance scores. For this, RRF is the gold industry standard for fusing ranking information: it calculates an overall ranking for each document by rewarding those that appear in high positions across both lists. The underlying logic is somewhat similar to that of the harmonic mean operator in statistics.

The overarching hybrid search process is implemented as follows:

Now it’s time to try it all out. Let’s formulate a user query and see what results we get.

The results are not excellent compared to a production RAG system, but bear in mind we tested this on a tiny, nine-document dataset. With that context, the outcome is quite reasonable.

Try modifying the query and replacing it with others related to temples, beaches, mountains, or anything else that comes to mind when thinking about eastern destinations. Can you find a scenario in which both the semantic results and the BM25 results are highly consistent with each other?

Wrapping Up

This article guided you through implementing a hybrid search mechanism for the retrieval stage of RAG systems. Choosing not to rely solely on semantic search is an important consideration when scaling RAG solutions to production environments.



Source link

You may also like

Leave a Comment