Prompt Chain

Index and Search Data with Weaviate Embeddings

Explore indexing and searching embeddings with Weaviate, a vector database.

Works with openaiweaviatedocker

91
Spark score
out of 100
Updated 3 months ago
Version 1.0.0
Models

Add to Favorites

Why it matters

Leverage Weaviate, a vector database, to securely store and semantically search your data embeddings for production AI use cases like chatbots and topic modeling.

Outcomes

What it gets done

01

Set up a local Weaviate instance using Docker.

02

Index text data and its embeddings into Weaviate.

03

Perform semantic similarity searches on indexed data.

04

Optionally, let Weaviate handle vectorization with OpenAI.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-usingweaviateforembeddingssearch | bash

Steps

Steps in the chain

01
Setup

Import the required libraries and set the embedding model that we'd like to use.

02
Load data

In this section we'll load embedded data that we've prepared previous to this session.

03
Weaviate Setup

Set up a local deployment of Weaviate. To run Weaviate locally, you'll need Docker. Navigate to the examples/vector_databases/weaviate/ directory and run docker-compose up -d. Alternatively, use Weaviate Cloud Service (WCS) to create a free Weaviate cluster by creating an account, creating a Weaviate Cluster with Sandbox Free, and noting the Cluster Id.

04
Index data

In Weaviate you create schemas to capture each of the entities you will be searching. Create a schema called Article with the title vector included for searching.

05
Search data

Fire queries at the new Index and get back results based on the closeness to existing vectors.

Overview

Using Weaviate for Embeddings Search

What it does

This prompt chain provides a guide on using Weaviate, a vector database, for indexing and searching embeddings. It covers the process of embedding data with OpenAI and then storing and querying these embeddings within Weaviate for semantic search applications.

How it connects

Use this when you need to store and search embeddings in a secure, scalable environment for production use cases such as chatbots, topic modeling, question and answering, and recommendation services. This is particularly useful when you have existing vectorized data or want to leverage Weaviate's automated vectorization with OpenAI. Do not use this if you do not require scalable, secure production-level embedding search or if you are not using OpenAI embeddings.

Source README

Using Weaviate for Embeddings Search

This notebook takes you through a simple flow to download some data, embed it, and then index and search it using a selection of vector databases. This is a common requirement for customers who want to store and search our embeddings with their own data in a secure environment to support production use cases such as chatbots, topic modelling and more.

What is a Vector Database

A vector database is a database made to store, manage and search embedding vectors. The use of embeddings to encode unstructured data (text, audio, video and more) as vectors for consumption by machine-learning models has exploded in recent years, due to the increasing effectiveness of AI in solving use cases involving natural language, image recognition and other unstructured forms of data. Vector databases have emerged as an effective solution for enterprises to deliver and scale these use cases.

Why use a Vector Database

Vector databases enable enterprises to take many of the embeddings use cases we've shared in this repo (question and answering, chatbot and recommendation services, for example), and make use of them in a secure, scalable environment. Many of our customers make embeddings solve their problems at small scale but performance and security hold them back from going into production - we see vector databases as a key component in solving that, and in this guide we'll walk through the basics of embedding text data, storing it in a vector database and using it for semantic search.

Demo Flow

The demo flow is:

  • Setup: Import packages and set any required variables
  • Load data: Load a dataset and embed it using OpenAI embeddings
  • Weaviate
    • Setup: Here we'll set up the Python client for Weaviate. For more details go here
    • Index Data: We'll create an index with title search vectors in it
    • Search Data: We'll run a few searches to confirm it works

Once you've run through this notebook you should have a basic understanding of how to setup and use vector databases, and can move on to more complex use cases making use of our embeddings.

Setup

Import the required libraries and set the embedding model that we'd like to use.

Load data

In this section we'll load embedded data that we've prepared previous to this session.

Weaviate

Another vector database option we'll explore is Weaviate, which offers both a managed, SaaS option, as well as a self-hosted open source option. As we've already looked at a cloud vector database, we'll try the self-hosted option here.

For this we will:

  • Set up a local deployment of Weaviate
  • Create indices in Weaviate
  • Store our data there
  • Fire some similarity search queries
  • Try a real use case

Bring your own vectors approach

In this cookbook, we provide the data with already generated vectors. This is a good approach for scenarios, where your data is already vectorized.

Automated vectorization with OpenAI module

For scenarios, where your data is not vectorized yet, you can delegate the vectorization task with OpenAI to Weaviate.
Weaviate offers a built-in module text2vec-openai, which takes care of the vectorization for you at:

  • import
  • for any CRUD operations
  • for semantic search

Check out the Getting Started with Weaviate and OpenAI module cookbook to learn step by step how to import and vectorize data in one step.

Setup

To run Weaviate locally, you'll need Docker. Following the instructions contained in the Weaviate documentation here, we created an example docker-compose.yml file in this repo saved at ./weaviate/docker-compose.yml.

After starting Docker, you can start Weaviate locally by navigating to the examples/vector_databases/weaviate/ directory and running docker-compose up -d.

SaaS

Alternatively you can use Weaviate Cloud Service (WCS) to create a free Weaviate cluster.

  1. create a free account and/or login to WCS
  2. create a Weaviate Cluster with the following settings:
    • Sandbox: Sandbox Free
    • Weaviate Version: Use default (latest)
    • OIDC Authentication: Disabled
  3. your instance should be ready in a minute or two
  4. make a note of the Cluster Id. The link will take you to the full path of your cluster (you will need it later to connect to it). It should be something like: https://your-project-name-suffix.weaviate.network

Index data

In Weaviate you create schemas to capture each of the entities you will be searching.

In this case we'll create a schema called Article with the title vector from above included for us to search by.

The next few steps closely follow the documentation Weaviate provides here.

Search data

As above, we'll fire some queries at our new Index and get back results based on the closeness to our existing vectors

Let Weaviate handle vector embeddings

Weaviate has a built-in module for OpenAI, which takes care of the steps required to generate a vector embedding for your queries and any CRUD operations.

This allows you to run a vector query with the with_near_text filter, which uses your OPEN_API_KEY.

Step 1: Setup

Import the required libraries and set the embedding model that we'd like to use.

Step 2: Load data

In this section we'll load embedded data that we've prepared previous to this session.

Step 3: Weaviate Setup

Set up a local deployment of Weaviate. To run Weaviate locally, you'll need Docker. Navigate to the examples/vector_databases/weaviate/ directory and run docker-compose up -d. Alternatively, use Weaviate Cloud Service (WCS) to create a free Weaviate cluster by creating an account, creating a Weaviate Cluster with Sandbox Free, and noting the Cluster Id.

Step 4: Index data

In Weaviate you create schemas to capture each of the entities you will be searching. Create a schema called Article with the title vector included for searching.

Step 5: Search data

Fire queries at the new Index and get back results based on the closeness to existing vectors.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.