Prompt Chain

Perform Vector Similarity Search with Neon Postgres

Guide to using Neon Serverless Postgres as a vector database for OpenAI embeddings with the pgvector extension.

Works with openaipostgresneon

91
Spark score
out of 100
Updated 3 months ago
Version 1.0.0

Add to Favorites

Why it matters

Leverage OpenAI embeddings and Neon Serverless Postgres with the pgvector extension to efficiently store and search vector data for similarity.

Outcomes

What it gets done

01

Store OpenAI embeddings in a Neon Postgres database.

02

Convert text queries to embeddings using the OpenAI API.

03

Perform vector similarity searches using pgvector.

04

Index and query vector data for nearest neighbors.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-neon-postgres-vector-search-pgvector | bash

Steps

Steps in the chain

01
Use embeddings created by OpenAI API

Use embeddings created by OpenAI API.

02
Store embeddings in Neon Serverless Postgres

Store embeddings in a Neon Serverless Postgres database.

03
Convert query to embedding with OpenAI API

Convert a raw text query to an embedding with OpenAI API.

04
Perform vector similarity search with pgvector

Use Neon with the `pgvector` extension to perform vector similarity search.

05
Install required modules

Install the `openai`, `psycopg2`, `pandas`, `wget`, and `python-dotenv` packages using pip.

06
Prepare your OpenAI API key

Obtain an OpenAI API key from https://platform.openai.com/account/api-keys and add it as an environment variable named `OPENAI_API_KEY` or provide it when prompted.

07
Test your OpenAI API key

Test your OpenAI API key to ensure it is working correctly.

08
Connect to your Neon database

Provide your Neon database connection string or define it in an `.env` file using a `DATABASE_URL` variable, then test the connection.

09
Import pre-computed embeddings

Import the pre-computed Wikipedia article embeddings zip file from the OpenAI Cookbook examples directory.

10
Extract the downloaded zip file

Extract the downloaded zip file containing the pre-computed embeddings.

11
Create table and add indexes for vector embeddings

Create a vector table called `articles` with `title` and `content` vector columns, and define indexes on both vector columns.

12
Load the data

Load the pre-computed vector data into your `articles` table from the `.csv` file. There are 25000 records, so expect the operation to take several minutes.

13
Verify data loaded successfully

Check the number of records to ensure the data has been loaded. There should be 25000 records.

14
Define the query_neon function

Define the `query_neon` function that creates an embedding based on the user's query, prepares the SQL query, and runs it with the embedding using the `text-embedding-3-small` model.

15
Run similarity search on title vectors

Run a similarity search based on `title_vector` embeddings to find nearest neighbors.

Overview

Vector similarity search using Neon Postgres

What it does

This notebook demonstrates how to use Neon Serverless Postgres as a vector database for OpenAI embeddings. It covers creating embeddings with the OpenAI API, storing them in Neon, and performing vector similarity searches using the `pgvector` extension.

How it connects

Use this notebook to learn how to implement vector similarity search with Neon Serverless Postgres and OpenAI embeddings. It is suitable for users who want to store and query vector data in Neon Postgres.

Source README

Vector similarity search using Neon Postgres

This notebook guides you through using Neon Serverless Postgres as a vector database for OpenAI embeddings. It demonstrates how to:

  1. Use embeddings created by OpenAI API.
  2. Store embeddings in a Neon Serverless Postgres database.
  3. Convert a raw text query to an embedding with OpenAI API.
  4. Use Neon with the pgvector extension to perform vector similarity search.

Prerequisites

Before you begin, ensure that you have the following:

  1. A Neon Postgres database. You can create an account and set up a project with a ready-to-use neondb database in a few simple steps. For instructions, see Sign up and Create your first project.
  2. A connection string for your Neon database. You can copy it from the Connection Details widget on the Neon Dashboard. See Connect from any application.
  3. The pgvector extension. Install the extension in Neon by running CREATE EXTENSION vector;. For instructions, see Enable the pgvector extension.
  4. Your OpenAI API key.
  5. Python and pip.

Install required modules

This notebook requires the openai, psycopg2, pandas, wget, and python-dotenv packages. You can install them with pip:

Prepare your OpenAI API key

An OpenAI API key is required to generate vectors for documents and queries.

If you do not have an OpenAI API key, obtain one from https://platform.openai.com/account/api-keys.

Add the OpenAI API key as an operating system environment variable or provide it for the session when prompted. If you define an environment variable, name the variable OPENAI_API_KEY.

For information about configuring your OpenAI API key as an environment variable, refer to Best Practices for API Key Safety.

Test your OpenAPI key

Connect to your Neon database

Provide your Neon database connection string below or define it in an .env file using a DATABASE_URL variable. For information about obtaining a Neon connection string, see Connect from any application.

Test the connection to your database:

This guide uses pre-computed Wikipedia article embeddings available in the OpenAI Cookbook examples directory so that you do not have to compute embeddings with your own OpenAI credits.

Import the pre-computed embeddings zip file:

Extract the downloaded zip file:

Create a table and add indexes for your vector embeddings

The vector table created in your database is called articles. Each object has title and content vectors.

An index is defined on both the title and content vector columns.

Load the data

Load the pre-computed vector data into your articles table from the .csv file. There are 25000 records, so expect the operation to take several minutes.

Check the number of records to ensure the data has been been loaded. There should be 25000 records.

Search your data

After the data is stored in your Neon database, you can query the data for nearest neighbors.

Start by defining the query_neon function, which is executed when you run the vector similarity search. The function creates an embedding based on the user's query, prepares the SQL query, and runs the SQL query with the embedding. The pre-computed embeddings that you loaded into your database were created with text-embedding-3-small OpenAI model, so you must use the same model to create an embedding for the similarity search.

A vector_name parameter is provided that allows you to search based on "title" or "content".

Run a similarity search based on title_vector embeddings:

Run a similarity search based on content_vector embeddings:

Step 1: Use embeddings created by OpenAI API

Use embeddings created by OpenAI API.

Step 2: Store embeddings in Neon Serverless Postgres

Store embeddings in a Neon Serverless Postgres database.

Step 3: Convert query to embedding with OpenAI API

Convert a raw text query to an embedding with OpenAI API.

Step 4: Perform vector similarity search with pgvector

Use Neon with the `pgvector` extension to perform vector similarity search.

Step 5: Install required modules

Install the `openai`, `psycopg2`, `pandas`, `wget`, and `python-dotenv` packages using pip.

Step 6: Prepare your OpenAI API key

Obtain an OpenAI API key from https://platform.openai.com/account/api-keys and add it as an environment variable named `OPENAI_API_KEY` or provide it when prompted.

Step 7: Test your OpenAI API key

Test your OpenAI API key to ensure it is working correctly.

Step 8: Connect to your Neon database

Provide your Neon database connection string or define it in an `.env` file using a `DATABASE_URL` variable, then test the connection.

Step 9: Import pre-computed embeddings

Import the pre-computed Wikipedia article embeddings zip file from the OpenAI Cookbook examples directory.

Step 10: Extract the downloaded zip file

Extract the downloaded zip file containing the pre-computed embeddings.

Step 11: Create table and add indexes for vector embeddings

Create a vector table called `articles` with `title` and `content` vector columns, and define indexes on both vector columns.

Step 12: Load the data

Load the pre-computed vector data into your `articles` table from the `.csv` file. There are 25000 records, so expect the operation to take several minutes.

Step 13: Verify data loaded successfully

Check the number of records to ensure the data has been loaded. There should be 25000 records.

Step 14: Define the query_neon function

Define the `query_neon` function that creates an embedding based on the user's query, prepares the SQL query, and runs it with the embedding using the `text-embedding-3-small` model.

Step 15: Run similarity search on title vectors

Run a similarity search based on `title_vector` embeddings to find nearest neighbors.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.