Prompt Chain

Index and Search Data with Weaviate Embeddings

Name: Index and Search Data with Weaviate Embeddings
Availability: OnlineOnly
Author: OpenAI Cookbook

Explore indexing and searching embeddings with Weaviate, a vector database.

Copy chain

Works with openaiweaviatedocker

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Models

gpt 4o

Add to Favorites

Why it matters

Leverage Weaviate, a vector database, to securely store and semantically search your data embeddings for production AI use cases like chatbots and topic modeling.

Outcomes

What it gets done

Set up a local Weaviate instance using Docker.

Index text data and its embeddings into Weaviate.

Perform semantic similarity searches on indexed data.

Optionally, let Weaviate handle vectorization with OpenAI.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-usingweaviateforembeddingssearch | bash

Steps

Steps in the chain

Setup

Import the required libraries and set the embedding model that we'd like to use.

Load data

In this section we'll load embedded data that we've prepared previous to this session.

Weaviate Setup

Set up a local deployment of Weaviate. To run Weaviate locally, you'll need Docker. Navigate to the examples/vector_databases/weaviate/ directory and run docker-compose up -d. Alternatively, use Weaviate Cloud Service (WCS) to create a free Weaviate cluster by creating an account, creating a Weaviate Cluster with Sandbox Free, and noting the Cluster Id.

Index data

In Weaviate you create schemas to capture each of the entities you will be searching. Create a schema called Article with the title vector included for searching.

Search data

Fire queries at the new Index and get back results based on the closeness to existing vectors.

Overview

Using Weaviate for Embeddings Search

What it does

This prompt chain provides a guide on using Weaviate, a vector database, for indexing and searching embeddings. It covers the process of embedding data with OpenAI and then storing and querying these embeddings within Weaviate for semantic search applications.

How it connects

Use this when you need to store and search embeddings in a secure, scalable environment for production use cases such as chatbots, topic modeling, question and answering, and recommendation services. This is particularly useful when you have existing vectorized data or want to leverage Weaviate's automated vectorization with OpenAI. Do not use this if you do not require scalable, secure production-level embedding search or if you are not using OpenAI embeddings.

Source README

Using Weaviate for Embeddings Search

This notebook takes you through a simple flow to download some data, embed it, and then index and search it using a selection of vector databases. This is a common requirement for customers who want to store and search our embeddings with their own data in a secure environment to support production use cases such as chatbots, topic modelling and more.

What is a Vector Database

A vector database is a database made to store, manage and search embedding vectors. The use of embeddings to encode unstructured data (text, audio, video and more) as vectors for consumption by machine-learning models has exploded in recent years, due to the increasing effectiveness of AI in solving use cases involving natural language, image recognition and other unstructured forms of data. Vector databases have emerged as an effective solution for enterprises to deliver and scale these use cases.

Why use a Vector Database

Vector databases enable enterprises to take many of the embeddings use cases we've shared in this repo (question and answering, chatbot and recommendation services, for example), and make use of them in a secure, scalable environment. Many of our customers make embeddings solve their problems at small scale but performance and security hold them back from going into production - we see vector databases as a key component in solving that, and in this guide we'll walk through the basics of embedding text data, storing it in a vector database and using it for semantic search.

Demo Flow

The demo flow is:

Setup: Import packages and set any required variables
Load data: Load a dataset and embed it using OpenAI embeddings
Weaviate
- Setup: Here we'll set up the Python client for Weaviate. For more details go here
- Index Data: We'll create an index with title search vectors in it
- Search Data: We'll run a few searches to confirm it works

Once you've run through this notebook you should have a basic understanding of how to setup and use vector databases, and can move on to more complex use cases making use of our embeddings.

Setup

Import the required libraries and set the embedding model that we'd like to use.

Load data

In this section we'll load embedded data that we've prepared previous to this session.

Weaviate

Another vector database option we'll explore is Weaviate, which offers both a managed, SaaS option, as well as a self-hosted open source option. As we've already looked at a cloud vector database, we'll try the self-hosted option here.

For this we will:

Set up a local deployment of Weaviate
Create indices in Weaviate
Store our data there
Fire some similarity search queries
Try a real use case

Bring your own vectors approach

In this cookbook, we provide the data with already generated vectors. This is a good approach for scenarios, where your data is already vectorized.

Automated vectorization with OpenAI module

For scenarios, where your data is not vectorized yet, you can delegate the vectorization task with OpenAI to Weaviate.
Weaviate offers a built-in module text2vec-openai, which takes care of the vectorization for you at:

import
for any CRUD operations
for semantic search

Check out the Getting Started with Weaviate and OpenAI module cookbook to learn step by step how to import and vectorize data in one step.

Setup

To run Weaviate locally, you'll need Docker. Following the instructions contained in the Weaviate documentation here, we created an example docker-compose.yml file in this repo saved at ./weaviate/docker-compose.yml.

After starting Docker, you can start Weaviate locally by navigating to the examples/vector_databases/weaviate/ directory and running docker-compose up -d.

SaaS

Alternatively you can use Weaviate Cloud Service (WCS) to create a free Weaviate cluster.

create a free account and/or login to WCS
create a Weaviate Cluster with the following settings:
- Sandbox: Sandbox Free
- Weaviate Version: Use default (latest)
- OIDC Authentication: Disabled
your instance should be ready in a minute or two
make a note of the Cluster Id. The link will take you to the full path of your cluster (you will need it later to connect to it). It should be something like: https://your-project-name-suffix.weaviate.network

Index data

In Weaviate you create schemas to capture each of the entities you will be searching.

In this case we'll create a schema called Article with the title vector from above included for us to search by.

The next few steps closely follow the documentation Weaviate provides here.

Search data

As above, we'll fire some queries at our new Index and get back results based on the closeness to our existing vectors

Let Weaviate handle vector embeddings

Weaviate has a built-in module for OpenAI, which takes care of the steps required to generate a vector embedding for your queries and any CRUD operations.

This allows you to run a vector query with the with_near_text filter, which uses your OPEN_API_KEY.

Step 1: Setup

Import the required libraries and set the embedding model that we'd like to use.

Step 2: Load data

In this section we'll load embedded data that we've prepared previous to this session.

Step 3: Weaviate Setup

Set up a local deployment of Weaviate. To run Weaviate locally, you'll need Docker. Navigate to the examples/vector_databases/weaviate/ directory and run docker-compose up -d. Alternatively, use Weaviate Cloud Service (WCS) to create a free Weaviate cluster by creating an account, creating a Weaviate Cluster with Sandbox Free, and noting the Cluster Id.

Step 4: Index data

In Weaviate you create schemas to capture each of the entities you will be searching. Create a schema called Article with the title vector included for searching.

Step 5: Search data

Fire queries at the new Index and get back results based on the closeness to existing vectors.

Discussion

Index and Search Data with Weaviate Embeddings

What it gets done

Add it to your toolbox

Steps in the chain

Using Weaviate for Embeddings Search

What it does

How it connects

Using Weaviate for Embeddings Search

What is a Vector Database

Why use a Vector Database

Demo Flow

Setup

Load data

Weaviate

Bring your own vectors approach

Automated vectorization with OpenAI module

Setup

SaaS

Index data

Search data

Let Weaviate handle vector embeddings

Step 1: Setup

Step 2: Load data

Step 3: Weaviate Setup

Step 4: Index data

Step 5: Search data

Questions & comments · 0