Prompt Chain

Index and Search Data with Qdrant Embeddings

Name: Index and Search Data with Qdrant Embeddings
Availability: OnlineOnly
Author: OpenAI Cookbook

Index and search embeddings using Qdrant. This guide covers setting up Qdrant, indexing data with titles and content, and performing searches.

Copy chain

Works with qdrantdocker

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Models

gpt 4o

Add to Favorites

Why it matters

Securely store and search your own data using embeddings with Qdrant, enabling production use cases like chatbots and topic modeling.

Outcomes

What it gets done

Load and embed data using OpenAI embeddings.

Set up and index data into a Qdrant vector database.

Perform semantic searches on indexed titles and content.

Understand the basics of vector database integration for AI applications.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-usingqdrantforembeddingssearch | bash

Steps

Steps in the chain

Setup

Import the required libraries and set the embedding model that we'd like to use.

Load data

In this section we'll load embedded data that we've prepared previous to this session.

Qdrant Setup

For the local deployment, we are going to use Docker, according to the Qdrant documentation: https://qdrant.tech/documentation/quick_start/. Qdrant requires just a single container, but an example of the docker-compose.yaml file is available at `./qdrant/docker-compose.yaml` in this repo. You can start Qdrant instance locally by navigating to this directory and running `docker-compose up -d`. You might need to increase the memory limit for Docker to 8GB or more.

Index data

Qdrant stores data in collections where each object is described by at least one vector and may contain additional metadata called payload. Create a collection called Articles where each object will be described by both title and content vectors. Use the official qdrant-client package. Define payload configuration to store id, title, and url of the articles alongside the vectors.

Search Data

Once the data is put into Qdrant, start querying the collection for the closest vectors. Provide an additional parameter `vector_name` to switch from title to content based search. Ensure you use the text-embedding-ada-002 model as the original embeddings in file were created with this model.

Overview

Using Qdrant for Embeddings Search

What it does

This prompt chain demonstrates how to index and search embedding vectors using Qdrant. It covers the process from loading and embedding data to setting up Qdrant, indexing content and titles, and performing searches.

How it connects

This is useful for understanding the basics of embedding text data, storing it in a vector database, and using it for semantic search. It addresses a common requirement for customers who want to store and search embeddings with their own data for use cases like chatbots and topic modeling.

Source README

Using Qdrant for Embeddings Search

This notebook takes you through a simple flow to download some data, embed it, and then index and search it using a selection of vector databases. This is a common requirement for customers who want to store and search our embeddings with their own data in a secure environment to support production use cases such as chatbots, topic modelling and more.

What is a Vector Database

A vector database is a database made to store, manage and search embedding vectors. The use of embeddings to encode unstructured data (text, audio, video and more) as vectors for consumption by machine-learning models has exploded in recent years, due to the increasing effectiveness of AI in solving use cases involving natural language, image recognition and other unstructured forms of data. Vector databases have emerged as an effective solution for enterprises to deliver and scale these use cases.

Why use a Vector Database

Vector databases enable enterprises to take many of the embeddings use cases we've shared in this repo (question and answering, chatbot and recommendation services, for example), and make use of them in a secure, scalable environment. Many of our customers make embeddings solve their problems at small scale but performance and security hold them back from going into production - we see vector databases as a key component in solving that, and in this guide we'll walk through the basics of embedding text data, storing it in a vector database and using it for semantic search.

Demo Flow

The demo flow is:

Setup: Import packages and set any required variables
Load data: Load a dataset and embed it using OpenAI embeddings
Qdrant
- Setup: Here we'll set up the Python client for Qdrant. For more details go here
- Index Data: We'll create a collection with vectors for titles and content
- Search Data: We'll run a few searches to confirm it works

Once you've run through this notebook you should have a basic understanding of how to setup and use vector databases, and can move on to more complex use cases making use of our embeddings.

Setup

Import the required libraries and set the embedding model that we'd like to use.

Load data

In this section we'll load embedded data that we've prepared previous to this session.

Qdrant

Qdrant. is a high-performant vector search database written in Rust. It offers both on-premise and cloud version, but for the purposes of that example we're going to use the local deployment mode.

Setting everything up will require:

Spinning up a local instance of Qdrant
Configuring the collection and storing the data in it
Trying out with some queries

Setup

You can start Qdrant instance locally by navigating to this directory and running docker-compose up -d

You might need to increase the memory limit for Docker to 8GB or more. Or Qdrant might fail to execute with an error message like 7 Killed.

Index data

Qdrant stores data in collections where each object is described by at least one vector and may contain an additional metadata called payload. Our collection will be called Articles and each object will be described by both title and content vectors.

We'll be using an official qdrant-client package that has all the utility methods already built-in.

In addition to the vector configuration defined under vector, we can also define the payload configuration. Payload is an optional field that allows you to store additional metadata alongside the vectors. In our case, we'll store the id, title, and url of the articles. As we return the title of nearest articles in the search results from payload, we can also provide the user with the URL to the article (which is part of the meta-data).

Search Data

Once the data is put into Qdrant we will start querying the collection for the closest vectors. We may provide an additional parameter vector_name to switch from title to content based search. Ensure you use the text-embedding-ada-002 model as the original embeddings in file were created with this model.

Step 1: Setup

Import the required libraries and set the embedding model that we'd like to use.

Step 2: Load data

In this section we'll load embedded data that we've prepared previous to this session.

Step 3: Qdrant Setup

For the local deployment, we are going to use Docker, according to the Qdrant documentation: https://qdrant.tech/documentation/quick_start/. Qdrant requires just a single container, but an example of the docker-compose.yaml file is available at `./qdrant/docker-compose.yaml` in this repo. You can start Qdrant instance locally by navigating to this directory and running `docker-compose up -d`. You might need to increase the memory limit for Docker to 8GB or more.

Step 4: Index data

Qdrant stores data in collections where each object is described by at least one vector and may contain additional metadata called payload. Create a collection called Articles where each object will be described by both title and content vectors. Use the official qdrant-client package. Define payload configuration to store id, title, and url of the articles alongside the vectors.

Step 5: Search Data

Once the data is put into Qdrant, start querying the collection for the closest vectors. Provide an additional parameter `vector_name` to switch from title to content based search. Ensure you use the text-embedding-ada-002 model as the original embeddings in file were created with this model.

Discussion

Index and Search Data with Qdrant Embeddings

What it gets done

Add it to your toolbox

Steps in the chain

Using Qdrant for Embeddings Search

What it does

How it connects

Using Qdrant for Embeddings Search

What is a Vector Database

Why use a Vector Database

Demo Flow

Setup

Load data

Qdrant

Setup

Index data

Search Data

Step 1: Setup

Step 2: Load data

Step 3: Qdrant Setup

Step 4: Index data

Step 5: Search Data

Questions & comments · 0