Prompt Chain

Perform Vector Similarity Search with Neon Postgres

Name: Perform Vector Similarity Search with Neon Postgres
Availability: OnlineOnly
Author: OpenAI Cookbook

Guide to using Neon Serverless Postgres as a vector database for OpenAI embeddings with the pgvector extension.

Copy chain

Works with openai postgresneon

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Add to Favorites

Why it matters

Leverage OpenAI embeddings and Neon Serverless Postgres with the pgvector extension to efficiently store and search vector data for similarity.

Outcomes

What it gets done

Store OpenAI embeddings in a Neon Postgres database.

Convert text queries to embeddings using the OpenAI API.

Perform vector similarity searches using pgvector.

Index and query vector data for nearest neighbors.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-neon-postgres-vector-search-pgvector | bash

Steps

Steps in the chain

Use embeddings created by OpenAI API

Use embeddings created by OpenAI API.

Store embeddings in Neon Serverless Postgres

Store embeddings in a Neon Serverless Postgres database.

Convert query to embedding with OpenAI API

Convert a raw text query to an embedding with OpenAI API.

Perform vector similarity search with pgvector

Use Neon with the `pgvector` extension to perform vector similarity search.

Install required modules

Install the `openai`, `psycopg2`, `pandas`, `wget`, and `python-dotenv` packages using pip.

Prepare your OpenAI API key

Obtain an OpenAI API key from https://platform.openai.com/account/api-keys and add it as an environment variable named `OPENAI_API_KEY` or provide it when prompted.

Test your OpenAI API key

Test your OpenAI API key to ensure it is working correctly.

Connect to your Neon database

Provide your Neon database connection string or define it in an `.env` file using a `DATABASE_URL` variable, then test the connection.

Import pre-computed embeddings

Import the pre-computed Wikipedia article embeddings zip file from the OpenAI Cookbook examples directory.

Extract the downloaded zip file

Extract the downloaded zip file containing the pre-computed embeddings.

Create table and add indexes for vector embeddings

Create a vector table called `articles` with `title` and `content` vector columns, and define indexes on both vector columns.

Load the data

Load the pre-computed vector data into your `articles` table from the `.csv` file. There are 25000 records, so expect the operation to take several minutes.

Verify data loaded successfully

Check the number of records to ensure the data has been loaded. There should be 25000 records.

Define the query_neon function

Define the `query_neon` function that creates an embedding based on the user's query, prepares the SQL query, and runs it with the embedding using the `text-embedding-3-small` model.

Run similarity search on title vectors

Run a similarity search based on `title_vector` embeddings to find nearest neighbors.

Overview

Vector similarity search using Neon Postgres

What it does

This notebook demonstrates how to use Neon Serverless Postgres as a vector database for OpenAI embeddings. It covers creating embeddings with the OpenAI API, storing them in Neon, and performing vector similarity searches using the `pgvector` extension.

How it connects

Use this notebook to learn how to implement vector similarity search with Neon Serverless Postgres and OpenAI embeddings. It is suitable for users who want to store and query vector data in Neon Postgres.

Source README

Vector similarity search using Neon Postgres

This notebook guides you through using Neon Serverless Postgres as a vector database for OpenAI embeddings. It demonstrates how to:

Use embeddings created by OpenAI API.
Store embeddings in a Neon Serverless Postgres database.
Convert a raw text query to an embedding with OpenAI API.
Use Neon with the pgvector extension to perform vector similarity search.

Prerequisites

Before you begin, ensure that you have the following:

A Neon Postgres database. You can create an account and set up a project with a ready-to-use neondb database in a few simple steps. For instructions, see Sign up and Create your first project.
A connection string for your Neon database. You can copy it from the Connection Details widget on the Neon Dashboard. See Connect from any application.
The pgvector extension. Install the extension in Neon by running CREATE EXTENSION vector;. For instructions, see Enable the pgvector extension.
Your OpenAI API key.
Python and pip.

Install required modules

This notebook requires the openai, psycopg2, pandas, wget, and python-dotenv packages. You can install them with pip:

Prepare your OpenAI API key

An OpenAI API key is required to generate vectors for documents and queries.

If you do not have an OpenAI API key, obtain one from https://platform.openai.com/account/api-keys.

Add the OpenAI API key as an operating system environment variable or provide it for the session when prompted. If you define an environment variable, name the variable OPENAI_API_KEY.

For information about configuring your OpenAI API key as an environment variable, refer to Best Practices for API Key Safety.

Test your OpenAPI key

Connect to your Neon database

Provide your Neon database connection string below or define it in an .env file using a DATABASE_URL variable. For information about obtaining a Neon connection string, see Connect from any application.

Test the connection to your database:

This guide uses pre-computed Wikipedia article embeddings available in the OpenAI Cookbook examples directory so that you do not have to compute embeddings with your own OpenAI credits.

Import the pre-computed embeddings zip file:

Extract the downloaded zip file:

Create a table and add indexes for your vector embeddings

The vector table created in your database is called articles. Each object has title and content vectors.

An index is defined on both the title and content vector columns.

Load the data

Load the pre-computed vector data into your articles table from the .csv file. There are 25000 records, so expect the operation to take several minutes.

Check the number of records to ensure the data has been been loaded. There should be 25000 records.

Search your data

After the data is stored in your Neon database, you can query the data for nearest neighbors.

Start by defining the query_neon function, which is executed when you run the vector similarity search. The function creates an embedding based on the user's query, prepares the SQL query, and runs the SQL query with the embedding. The pre-computed embeddings that you loaded into your database were created with text-embedding-3-small OpenAI model, so you must use the same model to create an embedding for the similarity search.

A vector_name parameter is provided that allows you to search based on "title" or "content".

Run a similarity search based on title_vector embeddings:

Run a similarity search based on content_vector embeddings:

Step 1: Use embeddings created by OpenAI API

Use embeddings created by OpenAI API.

Step 2: Store embeddings in Neon Serverless Postgres

Store embeddings in a Neon Serverless Postgres database.

Step 3: Convert query to embedding with OpenAI API

Convert a raw text query to an embedding with OpenAI API.

Step 4: Perform vector similarity search with pgvector

Use Neon with the `pgvector` extension to perform vector similarity search.

Step 5: Install required modules

Install the `openai`, `psycopg2`, `pandas`, `wget`, and `python-dotenv` packages using pip.

Step 6: Prepare your OpenAI API key

Obtain an OpenAI API key from https://platform.openai.com/account/api-keys and add it as an environment variable named `OPENAI_API_KEY` or provide it when prompted.

Step 7: Test your OpenAI API key

Test your OpenAI API key to ensure it is working correctly.

Step 8: Connect to your Neon database

Provide your Neon database connection string or define it in an `.env` file using a `DATABASE_URL` variable, then test the connection.

Step 9: Import pre-computed embeddings

Import the pre-computed Wikipedia article embeddings zip file from the OpenAI Cookbook examples directory.

Step 10: Extract the downloaded zip file

Extract the downloaded zip file containing the pre-computed embeddings.

Step 11: Create table and add indexes for vector embeddings

Create a vector table called `articles` with `title` and `content` vector columns, and define indexes on both vector columns.

Step 12: Load the data

Load the pre-computed vector data into your `articles` table from the `.csv` file. There are 25000 records, so expect the operation to take several minutes.

Step 13: Verify data loaded successfully

Check the number of records to ensure the data has been loaded. There should be 25000 records.

Step 14: Define the query_neon function

Define the `query_neon` function that creates an embedding based on the user's query, prepares the SQL query, and runs it with the embedding using the `text-embedding-3-small` model.

Step 15: Run similarity search on title vectors

Run a similarity search based on `title_vector` embeddings to find nearest neighbors.

Discussion

Perform Vector Similarity Search with Neon Postgres

What it gets done

Add it to your toolbox

Steps in the chain

Vector similarity search using Neon Postgres

What it does

How it connects

Vector similarity search using Neon Postgres

Prerequisites

Install required modules

Prepare your OpenAI API key

Test your OpenAPI key

Connect to your Neon database

Create a table and add indexes for your vector embeddings

Load the data

Search your data

Step 1: Use embeddings created by OpenAI API

Step 2: Store embeddings in Neon Serverless Postgres

Step 3: Convert query to embedding with OpenAI API

Step 4: Perform vector similarity search with pgvector

Step 5: Install required modules

Step 6: Prepare your OpenAI API key

Step 7: Test your OpenAI API key

Step 8: Connect to your Neon database

Step 9: Import pre-computed embeddings

Step 10: Extract the downloaded zip file

Step 11: Create table and add indexes for vector embeddings

Step 12: Load the data

Step 13: Verify data loaded successfully

Step 14: Define the query_neon function

Step 15: Run similarity search on title vectors

Questions & comments · 0