Prompt Chain

Vector Search with OpenAI Embeddings in AnalyticDB

Name: Vector Search with OpenAI Embeddings in AnalyticDB
Availability: OnlineOnly
Author: OpenAI Cookbook

Step-by-step workflow for storing OpenAI embeddings in AnalyticDB vector database and performing nearest neighbor search.

Copy chain

Works with openaianalyticdbpostgres

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Add to Favorites

Why it matters

Leverage AnalyticDB as a high-performance vector database for your OpenAI embeddings. This asset enables efficient nearest neighbor searches on your data, powered by cloud-native vector compute.

Outcomes

What it gets done

Store OpenAI embeddings in AnalyticDB.

Convert text queries to embeddings using OpenAI API.

Perform nearest neighbor searches within AnalyticDB.

Utilize AnalyticDB's PostgreSQL compatibility for data management.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-gettingstartedwithanalyticdbandopenai | bash

Steps

Steps in the chain

Using precomputed embeddings created by OpenAI API

Using precomputed embeddings created by OpenAI API.

Storing embeddings in AnalyticDB cloud instance

Storing the embeddings in a cloud instance of AnalyticDB.

Converting raw text query to embedding with OpenAI API

Converting raw text query to an embedding with OpenAI API.

Performing nearest neighbour search in AnalyticDB

Using AnalyticDB to perform the nearest neighbour search in the created collection.

Install requirements

Install the openai and psycopg2 packages, along with other additional libraries required for the notebook.

Prepare your OpenAI API key

Get an OpenAI API key from https://beta.openai.com/account/api-keys and add it to your environment variables as OPENAI_API_KEY.

Connect to AnalyticDB

Connect to a running instance of AnalyticDB server using the official Python library psycopg2. Add connection parameters to environment variables or modify the psycopg2.connect parameters directly.

Index data

Create a relation called articles with vector indexes on both title and content fields. Fill it with precomputed embeddings.

Load data

Load the data prepared previously so you don't have to recompute the embeddings of Wikipedia articles with your own credits.

Search data

Query the collection for the closest vectors. Provide vector_name parameter to switch between title and content based search. Use text-embedding-3-small OpenAI model for search.

Overview

Using AnalyticDB as a vector database for OpenAI embeddings

What it does

An end-to-end tutorial notebook that demonstrates using AnalyticDB, a PostgreSQL-compatible cloud vector database from Alibaba Cloud, to store and search OpenAI embeddings. Covers the complete workflow from storing precomputed embeddings in an articles relation with vector indexes to performing nearest neighbor searches using the text-embedding-3-small model.

How it connects

Use this when you need to implement semantic search functionality using OpenAI embeddings with AnalyticDB as your vector storage backend, particularly if you're working within the Alibaba Cloud ecosystem or need PostgreSQL compatibility for your vector database operations.

Source README

Using AnalyticDB as a vector database for OpenAI embeddings

This notebook guides you step by step on using AnalyticDB as a vector database for OpenAI embeddings.

This notebook presents an end-to-end process of:

Using precomputed embeddings created by OpenAI API.
Storing the embeddings in a cloud instance of AnalyticDB.
Converting raw text query to an embedding with OpenAI API.
Using AnalyticDB to perform the nearest neighbour search in the created collection.

What is AnalyticDB

AnalyticDB is a high-performance distributed vector database. Fully compatible with PostgreSQL syntax, you can effortlessly utilize it. AnalyticDB is Alibaba Cloud managed cloud-native database with strong-performed vector compute engine. Absolute out-of-box experience allow to scale into billions of data vectors processing with rich features including indexing algorithms, structured & non-structured data features, realtime update, distance metrics, scalar filtering, time travel searches etc. Also equipped with full OLAP database functionality and SLA commitment for production usage promise;

Deployment options

Using AnalyticDB Cloud Vector Database. Click here to fast deploy it.

Prerequisites

For the purposes of this exercise we need to prepare a couple of things:

AnalyticDB cloud server instance.
The 'psycopg2' library to interact with the vector database. Any other postgresql client library is ok.
An OpenAI API key.

We might validate if the server was launched successfully by running a simple curl command:

Install requirements

This notebook obviously requires the openai and psycopg2 packages, but there are also some other additional libraries we will use. The following command installs them all:

Prepare your OpenAI API key

The OpenAI API key is used for vectorization of the documents and queries.

If you don't have an OpenAI API key, you can get one from https://beta.openai.com/account/api-keys.

Once you get your key, please add it to your environment variables as OPENAI_API_KEY.

Connect to AnalyticDB

First add it to your environment variables. or you can just change the "psycopg2.connect" parameters below

Connecting to a running instance of AnalyticDB server is easy with the official Python library:

We can test the connection by running any available method:

The downloaded file has to be then extracted:

Index data

AnalyticDB stores data in relation where each object is described by at least one vector. Our relation will be called articles and each object will be described by both title and content vectors. \

We will start with creating a relation and create a vector index on both title and content, and then we will fill it with our precomputed embeddings.

Load data

In this section we are going to load the data prepared previous to this session, so you don't have to recompute the embeddings of Wikipedia articles with your own credits.

Search data

Once the data is put into Qdrant we will start querying the collection for the closest vectors. We may provide an additional parameter vector_name to switch from title to content based search. Since the precomputed embeddings were created with text-embedding-3-small OpenAI model we also have to use it during search.

Step 1: Using precomputed embeddings created by OpenAI API

Using precomputed embeddings created by OpenAI API.

Step 2: Storing embeddings in AnalyticDB cloud instance

Storing the embeddings in a cloud instance of AnalyticDB.

Step 3: Converting raw text query to embedding with OpenAI API

Converting raw text query to an embedding with OpenAI API.

Step 4: Performing nearest neighbour search in AnalyticDB

Using AnalyticDB to perform the nearest neighbour search in the created collection.

Step 5: Install requirements

Install the openai and psycopg2 packages, along with other additional libraries required for the notebook.

Step 6: Prepare your OpenAI API key

Get an OpenAI API key from https://beta.openai.com/account/api-keys and add it to your environment variables as OPENAI_API_KEY.

Step 7: Connect to AnalyticDB

Connect to a running instance of AnalyticDB server using the official Python library psycopg2. Add connection parameters to environment variables or modify the psycopg2.connect parameters directly.

Step 8: Index data

Create a relation called articles with vector indexes on both title and content fields. Fill it with precomputed embeddings.

Step 9: Load data

Load the data prepared previously so you don't have to recompute the embeddings of Wikipedia articles with your own credits.

Step 10: Search data

Query the collection for the closest vectors. Provide vector_name parameter to switch between title and content based search. Use text-embedding-3-small OpenAI model for search.

Discussion

Vector Search with OpenAI Embeddings in AnalyticDB

What it gets done

Add it to your toolbox

Steps in the chain

Using AnalyticDB as a vector database for OpenAI embeddings

What it does

How it connects

Using AnalyticDB as a vector database for OpenAI embeddings

What is AnalyticDB

Deployment options

Prerequisites

Install requirements

Prepare your OpenAI API key

Connect to AnalyticDB

Index data

Load data

Search data

Step 1: Using precomputed embeddings created by OpenAI API

Step 2: Storing embeddings in AnalyticDB cloud instance

Step 3: Converting raw text query to embedding with OpenAI API

Step 4: Performing nearest neighbour search in AnalyticDB

Step 5: Install requirements

Step 6: Prepare your OpenAI API key

Step 7: Connect to AnalyticDB

Step 8: Index data

Step 9: Load data

Step 10: Search data

Questions & comments · 0