Back to catalog
Prompt Chain OpenAI Cookbook 5.0 (1) 0
Add to Favorites

Using PolarDB-PG as a vector database for OpenAI embeddings

This notebook guides you step by step on using PolarDB-PG as a vector database for OpenAI embeddings.

Using PolarDB-PG as a vector database for OpenAI embeddings

This notebook guides you step by step on using PolarDB-PG as a vector database for OpenAI embeddings.

This notebook presents an end-to-end process of:

  1. Using precomputed embeddings created by OpenAI API.
  2. Storing the embeddings in a cloud instance of PolarDB-PG.
  3. Converting raw text query to an embedding with OpenAI API.
  4. Using PolarDB-PG to perform the nearest neighbour search in the created collection.

What is PolarDB-PG

PolarDB-PG is a high-performance vector database that adopts a read-write separation architecture. It is a cloud-native database managed by Alibaba Cloud, 100% compatible with PostgreSQL, and highly compatible with Oracle syntax. It supports processing massive vector data storage and queries, and greatly improves the efficiency of vector calculations through optimization of underlying execution algorithms, providing users with fast, elastic, high-performance, massive storage, and secure and reliable vector database services. Additionally, PolarDB-PG also supports multi-dimensional and multi-modal spatiotemporal information engines and geographic information engines.At the same time, PolarDB-PG is equipped with complete OLAP functionality and service level agreements, which has been recognized and used by many users;

Deployment options

Prerequisites

For the purposes of this exercise we need to prepare a couple of things:

  1. PolarDB-PG cloud server instance.
  2. The 'psycopg2' library to interact with the vector database. Any other postgresql client library is ok.
  3. An OpenAI API key.

We might validate if the server was launched successfully by running a simple curl command:

Install requirements

This notebook obviously requires the openai and psycopg2 packages, but there are also some other additional libraries we will use. The following command installs them all:

Prepare your OpenAI API key
The OpenAI API key is used for vectorization of the documents and queries.

If you don't have an OpenAI API key, you can get one from https://beta.openai.com/account/api-keys.

Once you get your key, please add it to your environment variables as OPENAI_API_KEY.

If you have any doubts about setting the API key through environment variables, please refer to Best Practices for API Key Safety.

Connect to PolarDB

First add it to your environment variables. or you can just change the "psycopg2.connect" parameters below

Connecting to a running instance of PolarDB server is easy with the official Python library:

We can test the connection by running any available method:

The downloaded file has to be then extracted:

Index data

PolarDB stores data in relation where each object is described by at least one vector. Our relation will be called articles and each object will be described by both title and content vectors.

We will start with creating a relation and create a vector index on both title and content, and then we will fill it with our precomputed embeddings.

Load data

In this section we are going to load the data prepared previous to this session, so you don't have to recompute the embeddings of Wikipedia articles with your own credits.

Search data

Once the data is put into Qdrant we will start querying the collection for the closest vectors. We may provide an additional parameter vector_name to switch from title to content based search. Since the precomputed embeddings were created with text-embedding-3-small OpenAI model we also have to use it during search.

Comments (0)

Sign In Sign in to leave a comment.