Perform Vector Similarity Search with Neon Postgres
Guide to using Neon Serverless Postgres as a vector database for OpenAI embeddings with the pgvector extension.
Why it matters
Leverage OpenAI embeddings and Neon Serverless Postgres with the pgvector extension to efficiently store and search vector data for similarity.
Outcomes
What it gets done
Store OpenAI embeddings in a Neon Postgres database.
Convert text queries to embeddings using the OpenAI API.
Perform vector similarity searches using pgvector.
Index and query vector data for nearest neighbors.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/oai-neon-postgres-vector-search-pgvector | bash Steps
Steps in the chain
Use embeddings created by OpenAI API.
Store embeddings in a Neon Serverless Postgres database.
Convert a raw text query to an embedding with OpenAI API.
Use Neon with the `pgvector` extension to perform vector similarity search.
Install the `openai`, `psycopg2`, `pandas`, `wget`, and `python-dotenv` packages using pip.
Obtain an OpenAI API key from https://platform.openai.com/account/api-keys and add it as an environment variable named `OPENAI_API_KEY` or provide it when prompted.
Test your OpenAI API key to ensure it is working correctly.
Provide your Neon database connection string or define it in an `.env` file using a `DATABASE_URL` variable, then test the connection.
Import the pre-computed Wikipedia article embeddings zip file from the OpenAI Cookbook examples directory.
Extract the downloaded zip file containing the pre-computed embeddings.
Create a vector table called `articles` with `title` and `content` vector columns, and define indexes on both vector columns.
Load the pre-computed vector data into your `articles` table from the `.csv` file. There are 25000 records, so expect the operation to take several minutes.
Check the number of records to ensure the data has been loaded. There should be 25000 records.
Define the `query_neon` function that creates an embedding based on the user's query, prepares the SQL query, and runs it with the embedding using the `text-embedding-3-small` model.
Run a similarity search based on `title_vector` embeddings to find nearest neighbors.
Overview
Vector similarity search using Neon Postgres
What it does
This notebook demonstrates how to use Neon Serverless Postgres as a vector database for OpenAI embeddings. It covers creating embeddings with the OpenAI API, storing them in Neon, and performing vector similarity searches using the `pgvector` extension.
How it connects
Use this notebook to learn how to implement vector similarity search with Neon Serverless Postgres and OpenAI embeddings. It is suitable for users who want to store and query vector data in Neon Postgres.
Source README
Vector similarity search using Neon Postgres
This notebook guides you through using Neon Serverless Postgres as a vector database for OpenAI embeddings. It demonstrates how to:
- Use embeddings created by OpenAI API.
- Store embeddings in a Neon Serverless Postgres database.
- Convert a raw text query to an embedding with OpenAI API.
- Use Neon with the
pgvectorextension to perform vector similarity search.
Prerequisites
Before you begin, ensure that you have the following:
- A Neon Postgres database. You can create an account and set up a project with a ready-to-use
neondbdatabase in a few simple steps. For instructions, see Sign up and Create your first project. - A connection string for your Neon database. You can copy it from the Connection Details widget on the Neon Dashboard. See Connect from any application.
- The
pgvectorextension. Install the extension in Neon by runningCREATE EXTENSION vector;. For instructions, see Enable the pgvector extension. - Your OpenAI API key.
- Python and
pip.
Install required modules
This notebook requires the openai, psycopg2, pandas, wget, and python-dotenv packages. You can install them with pip:
Prepare your OpenAI API key
An OpenAI API key is required to generate vectors for documents and queries.
If you do not have an OpenAI API key, obtain one from https://platform.openai.com/account/api-keys.
Add the OpenAI API key as an operating system environment variable or provide it for the session when prompted. If you define an environment variable, name the variable OPENAI_API_KEY.
For information about configuring your OpenAI API key as an environment variable, refer to Best Practices for API Key Safety.
Test your OpenAPI key
Connect to your Neon database
Provide your Neon database connection string below or define it in an .env file using a DATABASE_URL variable. For information about obtaining a Neon connection string, see Connect from any application.
Test the connection to your database:
This guide uses pre-computed Wikipedia article embeddings available in the OpenAI Cookbook examples directory so that you do not have to compute embeddings with your own OpenAI credits.
Import the pre-computed embeddings zip file:
Extract the downloaded zip file:
Create a table and add indexes for your vector embeddings
The vector table created in your database is called articles. Each object has title and content vectors.
An index is defined on both the title and content vector columns.
Load the data
Load the pre-computed vector data into your articles table from the .csv file. There are 25000 records, so expect the operation to take several minutes.
Check the number of records to ensure the data has been been loaded. There should be 25000 records.
Search your data
After the data is stored in your Neon database, you can query the data for nearest neighbors.
Start by defining the query_neon function, which is executed when you run the vector similarity search. The function creates an embedding based on the user's query, prepares the SQL query, and runs the SQL query with the embedding. The pre-computed embeddings that you loaded into your database were created with text-embedding-3-small OpenAI model, so you must use the same model to create an embedding for the similarity search.
A vector_name parameter is provided that allows you to search based on "title" or "content".
Run a similarity search based on title_vector embeddings:
Run a similarity search based on content_vector embeddings:
Step 1: Use embeddings created by OpenAI API
Use embeddings created by OpenAI API.
Step 2: Store embeddings in Neon Serverless Postgres
Store embeddings in a Neon Serverless Postgres database.
Step 3: Convert query to embedding with OpenAI API
Convert a raw text query to an embedding with OpenAI API.
Step 4: Perform vector similarity search with pgvector
Use Neon with the `pgvector` extension to perform vector similarity search.
Step 5: Install required modules
Install the `openai`, `psycopg2`, `pandas`, `wget`, and `python-dotenv` packages using pip.
Step 6: Prepare your OpenAI API key
Obtain an OpenAI API key from https://platform.openai.com/account/api-keys and add it as an environment variable named `OPENAI_API_KEY` or provide it when prompted.
Step 7: Test your OpenAI API key
Test your OpenAI API key to ensure it is working correctly.
Step 8: Connect to your Neon database
Provide your Neon database connection string or define it in an `.env` file using a `DATABASE_URL` variable, then test the connection.
Step 9: Import pre-computed embeddings
Import the pre-computed Wikipedia article embeddings zip file from the OpenAI Cookbook examples directory.
Step 10: Extract the downloaded zip file
Extract the downloaded zip file containing the pre-computed embeddings.
Step 11: Create table and add indexes for vector embeddings
Create a vector table called `articles` with `title` and `content` vector columns, and define indexes on both vector columns.
Step 12: Load the data
Load the pre-computed vector data into your `articles` table from the `.csv` file. There are 25000 records, so expect the operation to take several minutes.
Step 13: Verify data loaded successfully
Check the number of records to ensure the data has been loaded. There should be 25000 records.
Step 14: Define the query_neon function
Define the `query_neon` function that creates an embedding based on the user's query, prepares the SQL query, and runs it with the embedding using the `text-embedding-3-small` model.
Step 15: Run similarity search on title vectors
Run a similarity search based on `title_vector` embeddings to find nearest neighbors.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.