Vector Search with OpenAI Embeddings in AnalyticDB
Step-by-step workflow for storing OpenAI embeddings in AnalyticDB vector database and performing nearest neighbor search.
Why it matters
Leverage AnalyticDB as a high-performance vector database for your OpenAI embeddings. This asset enables efficient nearest neighbor searches on your data, powered by cloud-native vector compute.
Outcomes
What it gets done
Store OpenAI embeddings in AnalyticDB.
Convert text queries to embeddings using OpenAI API.
Perform nearest neighbor searches within AnalyticDB.
Utilize AnalyticDB's PostgreSQL compatibility for data management.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/oai-gettingstartedwithanalyticdbandopenai | bash Steps
Steps in the chain
Using precomputed embeddings created by OpenAI API.
Storing the embeddings in a cloud instance of AnalyticDB.
Converting raw text query to an embedding with OpenAI API.
Using AnalyticDB to perform the nearest neighbour search in the created collection.
Install the openai and psycopg2 packages, along with other additional libraries required for the notebook.
Get an OpenAI API key from https://beta.openai.com/account/api-keys and add it to your environment variables as OPENAI_API_KEY.
Connect to a running instance of AnalyticDB server using the official Python library psycopg2. Add connection parameters to environment variables or modify the psycopg2.connect parameters directly.
Create a relation called articles with vector indexes on both title and content fields. Fill it with precomputed embeddings.
Load the data prepared previously so you don't have to recompute the embeddings of Wikipedia articles with your own credits.
Query the collection for the closest vectors. Provide vector_name parameter to switch between title and content based search. Use text-embedding-3-small OpenAI model for search.
Overview
Using AnalyticDB as a vector database for OpenAI embeddings
What it does
An end-to-end tutorial notebook that demonstrates using AnalyticDB, a PostgreSQL-compatible cloud vector database from Alibaba Cloud, to store and search OpenAI embeddings. Covers the complete workflow from storing precomputed embeddings in an articles relation with vector indexes to performing nearest neighbor searches using the text-embedding-3-small model.
How it connects
Use this when you need to implement semantic search functionality using OpenAI embeddings with AnalyticDB as your vector storage backend, particularly if you're working within the Alibaba Cloud ecosystem or need PostgreSQL compatibility for your vector database operations.
Source README
Using AnalyticDB as a vector database for OpenAI embeddings
This notebook guides you step by step on using AnalyticDB as a vector database for OpenAI embeddings.
This notebook presents an end-to-end process of:
- Using precomputed embeddings created by OpenAI API.
- Storing the embeddings in a cloud instance of AnalyticDB.
- Converting raw text query to an embedding with OpenAI API.
- Using AnalyticDB to perform the nearest neighbour search in the created collection.
What is AnalyticDB
AnalyticDB is a high-performance distributed vector database. Fully compatible with PostgreSQL syntax, you can effortlessly utilize it. AnalyticDB is Alibaba Cloud managed cloud-native database with strong-performed vector compute engine. Absolute out-of-box experience allow to scale into billions of data vectors processing with rich features including indexing algorithms, structured & non-structured data features, realtime update, distance metrics, scalar filtering, time travel searches etc. Also equipped with full OLAP database functionality and SLA commitment for production usage promise;
Deployment options
- Using AnalyticDB Cloud Vector Database. Click here to fast deploy it.
Prerequisites
For the purposes of this exercise we need to prepare a couple of things:
- AnalyticDB cloud server instance.
- The 'psycopg2' library to interact with the vector database. Any other postgresql client library is ok.
- An OpenAI API key.
We might validate if the server was launched successfully by running a simple curl command:
Install requirements
This notebook obviously requires the openai and psycopg2 packages, but there are also some other additional libraries we will use. The following command installs them all:
Prepare your OpenAI API key
The OpenAI API key is used for vectorization of the documents and queries.
If you don't have an OpenAI API key, you can get one from https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY.
Connect to AnalyticDB
First add it to your environment variables. or you can just change the "psycopg2.connect" parameters below
Connecting to a running instance of AnalyticDB server is easy with the official Python library:
We can test the connection by running any available method:
The downloaded file has to be then extracted:
Index data
AnalyticDB stores data in relation where each object is described by at least one vector. Our relation will be called articles and each object will be described by both title and content vectors. \
We will start with creating a relation and create a vector index on both title and content, and then we will fill it with our precomputed embeddings.
Load data
In this section we are going to load the data prepared previous to this session, so you don't have to recompute the embeddings of Wikipedia articles with your own credits.
Search data
Once the data is put into Qdrant we will start querying the collection for the closest vectors. We may provide an additional parameter vector_name to switch from title to content based search. Since the precomputed embeddings were created with text-embedding-3-small OpenAI model we also have to use it during search.
Step 1: Using precomputed embeddings created by OpenAI API
Using precomputed embeddings created by OpenAI API.
Step 2: Storing embeddings in AnalyticDB cloud instance
Storing the embeddings in a cloud instance of AnalyticDB.
Step 3: Converting raw text query to embedding with OpenAI API
Converting raw text query to an embedding with OpenAI API.
Step 4: Performing nearest neighbour search in AnalyticDB
Using AnalyticDB to perform the nearest neighbour search in the created collection.
Step 5: Install requirements
Install the openai and psycopg2 packages, along with other additional libraries required for the notebook.
Step 6: Prepare your OpenAI API key
Get an OpenAI API key from https://beta.openai.com/account/api-keys and add it to your environment variables as OPENAI_API_KEY.
Step 7: Connect to AnalyticDB
Connect to a running instance of AnalyticDB server using the official Python library psycopg2. Add connection parameters to environment variables or modify the psycopg2.connect parameters directly.
Step 8: Index data
Create a relation called articles with vector indexes on both title and content fields. Fill it with precomputed embeddings.
Step 9: Load data
Load the data prepared previously so you don't have to recompute the embeddings of Wikipedia articles with your own credits.
Step 10: Search data
Query the collection for the closest vectors. Provide vector_name parameter to switch between title and content based search. Use text-embedding-3-small OpenAI model for search.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.