Prompt Chain

Build RAG with Elasticsearch and OpenAI

Name: Build RAG with Elasticsearch and OpenAI
Availability: OnlineOnly
Author: OpenAI Cookbook

A prompt workflow that indexes OpenAI Wikipedia embeddings into Elasticsearch, performs semantic search with kNN queries, and sends retrieved documents to GPT

Copy chain

Works with openaielasticsearchpandas

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated today

Version 1.0.0

Models

gpt 4o gemini 2 0

Add to Favorites

Why it matters

Implement Retrieval Augmented Generation (RAG) by indexing data into Elasticsearch, performing semantic search with OpenAI embeddings, and generating responses using chat completions.

Outcomes

What it gets done

Index OpenAI Wikipedia dataset into Elasticsearch with dense vector mappings.

Embed user questions using OpenAI's embedding models.

Perform kNN semantic search on Elasticsearch for relevant documents.

Generate context-aware answers using OpenAI's Chat Completions API.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-elasticsearch-retrieval-augmented-generation | bash

Steps

Steps in the chain

Install packages and import modules

Install necessary packages and import required modules for the RAG workflow.

Connect to Elasticsearch

Create a client instance with the Cloud ID and password for your Elasticsearch deployment. Find the Cloud ID by going to https://cloud.elastic.co/deployments and selecting your deployment.

Download the dataset

Download the OpenAI Wikipedia embeddings dataset and extract the zip file.

Read CSV file into a Pandas DataFrame

Use the Pandas library to read the unzipped CSV file into a DataFrame. This makes it easier to index the data into Elasticsearch in bulk.

Create index with mapping

Create an Elasticsearch index with necessary mappings using the `dense_vector` field type for the `title_vector` and `content_vector` fields. This enables storing dense vectors in Elasticsearch for kNN search.

Index data into Elasticsearch

Generate bulk actions and index multiple documents efficiently using Elasticsearch's Bulk API. Index data in batches of 100 using the Python client's helpers for the bulk API.

Encode a question with OpenAI embedding model

Encode queries with the same embedding model used to encode documents at index time. Use the `text-embedding-3-small` model with your OpenAI API key to generate embeddings.

Run semantic search queries

Run kNN queries against the Elasticsearch index using the encoded question. Use the Elasticsearch kNN query option to perform semantic search and retrieve top results.

Use Chat Completions API for retrieval augmented generation

Send the question and retrieved text to OpenAI's Chat Completions API. Use the top kNN hit as context for the model to generate a response. Use the `gpt-3.5-turbo` model with system and user messages to shape the prompt.

Overview

Retrieval augmented generation using Elasticsearch and OpenAI

What it does

This is a Jupyter notebook that demonstrates a complete RAG implementation combining Elasticsearch's semantic search capabilities with OpenAI's embedding and chat completion APIs, using the OpenAI Wikipedia vector dataset as example data.

How it connects

Use this workflow when you want to see a working example of RAG that leverages Elasticsearch for document retrieval and OpenAI models for embeddings and chat completions, particularly if you're working with the OpenAI Wikipedia embeddings dataset or need a blueprint for similar implementations.

Source README

Retrieval augmented generation using Elasticsearch and OpenAI

This notebook demonstrates how to:

Index the OpenAI Wikipedia vector dataset into Elasticsearch
Embed a question with the OpenAI embeddings endpoint
Perform semantic search on the Elasticsearch index using the encoded question
Send the top search results to the OpenAI Chat Completions API endpoint for retrieval augmented generation (RAG)

ℹ️ If you've already worked through our semantic search notebook, you can skip ahead to the final step!

Install packages and import modules

# install packages

!python3 -m pip install -qU openai pandas wget elasticsearch

# import modules

from getpass import getpass
from elasticsearch import Elasticsearch, helpers
import wget
import zipfile
import pandas as pd
import json
import openai

Connect to Elasticsearch

ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook.
If you don't already have an Elastic deployment, you can sign up for a free Elastic Cloud trial.

To connect to Elasticsearch, you need to create a client instance with the Cloud ID and password for your deployment.

Find the Cloud ID for your deployment by going to https://cloud.elastic.co/deployments and selecting your deployment.

CLOUD_ID = getpass("Elastic deployment Cloud ID")
CLOUD_PASSWORD = getpass("Elastic deployment Password")
client = Elasticsearch(
  cloud_id = CLOUD_ID,
  basic_auth=("elastic", CLOUD_PASSWORD) # Alternatively use `api_key` instead of `basic_auth`
)

# Test connection to Elasticsearch
print(client.info())

Download the dataset

In this step we download the OpenAI Wikipedia embeddings dataset, and extract the zip file.

embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip'
wget.download(embeddings_url)

with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip",
"r") as zip_ref:
    zip_ref.extractall("data")

Read CSV file into a Pandas DataFrame.

Next we use the Pandas library to read the unzipped CSV file into a DataFrame. This step makes it easier to index the data into Elasticsearch in bulk.

wikipedia_dataframe = pd.read_csv("data/vector_database_wikipedia_articles_embedded.csv")

Create index with mapping

Now we need to create an Elasticsearch index with the necessary mappings. This will enable us to index the data into Elasticsearch.

We use the dense_vector field type for the title_vector and content_vector fields. This is a special field type that allows us to store dense vectors in Elasticsearch.

Later, we'll need to target the dense_vector field for kNN search.

index_mapping= {
    "properties": {
      "title_vector": {
          "type": "dense_vector",
          "dims": 1536,
          "index": "true",
          "similarity": "cosine"
      },
      "content_vector": {
          "type": "dense_vector",
          "dims": 1536,
          "index": "true",
          "similarity": "cosine"
      },
      "text": {"type": "text"},
      "title": {"type": "text"},
      "url": { "type": "keyword"},
      "vector_id": {"type": "long"}
      
    }
}

client.indices.create(index="wikipedia_vector_index", mappings=index_mapping)

Index data into Elasticsearch

The following function generates the required bulk actions that can be passed to Elasticsearch's Bulk API, so we can index multiple documents efficiently in a single request.

For each row in the DataFrame, the function yields a dictionary representing a single document to be indexed.

def dataframe_to_bulk_actions(df):
    for index, row in df.iterrows():
        yield {
            "_index": 'wikipedia_vector_index',
            "_id": row['id'],
            "_source": {
                'url' : row["url"],
                'title' : row["title"],
                'text' : row["text"],
                'title_vector' : json.loads(row["title_vector"]),
                'content_vector' : json.loads(row["content_vector"]),
                'vector_id' : row["vector_id"]
            }
        }

As the dataframe is large, we will index data in batches of 100. We index the data into Elasticsearch using the Python client's helpers for the bulk API.

start = 0
end = len(wikipedia_dataframe)
batch_size = 100
for batch_start in range(start, end, batch_size):
    batch_end = min(batch_start + batch_size, end)
    batch_dataframe = wikipedia_dataframe.iloc[batch_start:batch_end]
    actions = dataframe_to_bulk_actions(batch_dataframe)
    helpers.bulk(client, actions)

Let's test the index with a simple match query.

print(client.search(index="wikipedia_vector_index", body={
    "_source": {
        "excludes": ["title_vector", "content_vector"]
    },
    "query": {
        "match": {
            "text": {
                "query": "Hummingbird"
            }
        }
    }
}))

Encode a question with OpenAI embedding model

To perform kNN search, we need to encode queries with the same embedding model used to encode the documents at index time.
In this example, we need to use the text-embedding-3-small model.

You'll need your OpenAI API key to generate the embeddings.

# Get OpenAI API key
OPENAI_API_KEY = getpass("Enter OpenAI API key")

# Set API key
openai.api_key = OPENAI_API_KEY

# Define model
EMBEDDING_MODEL = "text-embedding-3-small"

# Define question
question = 'Is the Atlantic the biggest ocean in the world?'

# Create embedding
question_embedding = openai.Embedding.create(input=question, model=EMBEDDING_MODEL)

Run semantic search queries

Now we're ready to run queries against our Elasticsearch index using our encoded question. We'll be doing a k-nearest neighbors search, using the Elasticsearch kNN query option.

First, we define a small function to pretty print the results.

# Function to pretty print Elasticsearch results

def pretty_response(response):
    for hit in response['hits']['hits']:
        id = hit['_id']
        score = hit['_score']
        title = hit['_source']['title']
        text = hit['_source']['text']
        pretty_output = (f"\nID: {id}\nTitle: {title}\nSummary: {text}\nScore: {score}")
        print(pretty_output)

Now let's run our kNN query.

response = client.search(
  index = "wikipedia_vector_index",
  knn={
      "field": "content_vector",
      "query_vector":  question_embedding["data"][0]["embedding"],
      "k": 10,
      "num_candidates": 100
    }
)
pretty_response(response)

top_hit_summary = response['hits']['hits'][0]['_source']['text'] # Store content of top hit for final step

Success! We've used kNN to perform semantic search over our dataset and found the top results.

Now we can use the Chat Completions API to work some generative AI magic using the top search result as additional context.

Use Chat Completions API for retrieval augmented generation

Now we can send the question and the text to OpenAI's chat completion API.

Using a LLM model together with a retrieval model is known as retrieval augmented generation (RAG). We're using Elasticsearch to do what it does best, retrieve relevant documents. Then we use the LLM to do what it does best, tasks like generating summaries and answering questions, using the retrieved documents as context.

The model will generate a response to the question, using the top kNN hit as context. Use the messages list to shape your prompt to the model. In this example, we're using the gpt-3.5-turbo model.

summary = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Answer the following question:" 
         + question 
         + "by using the following text:" 
         + top_hit_summary},
    ]
)

choices = summary.choices

for choice in choices:
    print("------------------------------------------------------------")
    print(choice.message.content)
    print("------------------------------------------------------------")

Code explanation

Here's what that code does:

Uses OpenAI's model to generate a response
Sends a conversation containing a system message and a user message to the model
The system message sets the assistant's role as "helpful assistant"
The user message contains a question as specified in the original kNN query and some input text
The response from the model is stored in the summary.choices variable

Next steps

That was just one example of how to combine Elasticsearch with the power of OpenAI's models, to enable retrieval augmented generation. RAG allows you to avoid the costly and complex process of training or fine-tuning models, by leveraging out-of-the-box models, enhanced with additional context.

Use this as a blueprint for your own experiments.

To adapt the conversation for different use cases, customize the system message to define the assistant's behavior or persona. Adjust the user message to specify the task, such as summarization or question answering, along with the desired format of the response.

Step 1: Install packages and import modules

Install necessary packages and import required modules for the RAG workflow.

Step 2: Connect to Elasticsearch

Create a client instance with the Cloud ID and password for your Elasticsearch deployment. Find the Cloud ID by going to https://cloud.elastic.co/deployments and selecting your deployment.

Step 3: Download the dataset

Download the OpenAI Wikipedia embeddings dataset and extract the zip file.

Step 4: Read CSV file into a Pandas DataFrame

Use the Pandas library to read the unzipped CSV file into a DataFrame. This makes it easier to index the data into Elasticsearch in bulk.

Step 5: Create index with mapping

Create an Elasticsearch index with necessary mappings using the `dense_vector` field type for the `title_vector` and `content_vector` fields. This enables storing dense vectors in Elasticsearch for kNN search.

Step 6: Index data into Elasticsearch

Generate bulk actions and index multiple documents efficiently using Elasticsearch's Bulk API. Index data in batches of 100 using the Python client's helpers for the bulk API.

Step 7: Encode a question with OpenAI embedding model

Encode queries with the same embedding model used to encode documents at index time. Use the `text-embedding-3-small` model with your OpenAI API key to generate embeddings.

Step 8: Run semantic search queries

Run kNN queries against the Elasticsearch index using the encoded question. Use the Elasticsearch kNN query option to perform semantic search and retrieve top results.

Step 9: Use Chat Completions API for retrieval augmented generation

Send the question and retrieved text to OpenAI's Chat Completions API. Use the top kNN hit as context for the model to generate a response. Use the `gpt-3.5-turbo` model with system and user messages to shape the prompt.

Discussion

Build RAG with Elasticsearch and OpenAI

What it gets done

Add it to your toolbox

Steps in the chain

Retrieval augmented generation using Elasticsearch and OpenAI

What it does

How it connects

Retrieval augmented generation using Elasticsearch and OpenAI

Install packages and import modules

Connect to Elasticsearch

Download the dataset

Read CSV file into a Pandas DataFrame.

Create index with mapping

Index data into Elasticsearch

Encode a question with OpenAI embedding model

Run semantic search queries

Use Chat Completions API for retrieval augmented generation

Code explanation

Next steps

Step 1: Install packages and import modules

Step 2: Connect to Elasticsearch

Step 3: Download the dataset

Step 4: Read CSV file into a Pandas DataFrame

Step 5: Create index with mapping

Step 6: Index data into Elasticsearch

Step 7: Encode a question with OpenAI embedding model

Step 8: Run semantic search queries

Step 9: Use Chat Completions API for retrieval augmented generation

Questions & comments · 0