Prompt Chain

Query Movies with Embeddings and Metadata Filters

Search movies using OpenAI embeddings and Zilliz vector database with metadata filtering.

Works with openaizillizpymilvusdatasetshuggingface

91
Spark score
out of 100
Updated 3 months ago
Version 1.0.0
Models

Add to Favorites

Why it matters

Leverage OpenAI embeddings and Zilliz vector database to find relevant movies based on descriptions and filter by metadata like release year or genre.

Outcomes

What it gets done

01

Generate embeddings for movie descriptions using OpenAI.

02

Store movie data and embeddings in Zilliz.

03

Perform filtered searches using natural language descriptions and metadata.

04

Retrieve and display relevant movie search results.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-filteredsearchwithzillizandopenai | bash

Steps

Steps in the chain

01
Install Required Libraries

Download and install the required libraries for this notebook: openai (for communicating with the OpenAI embedding service), pymilvus (for communicating with the Zilliz server), datasets (for downloading the dataset), and tqdm (for progress bars).

02
Set Up Zilliz Database

Set up your Zilliz account and database. Configure the following values: URI (database URI), USER (database username), PASSWORD (database password), COLLECTION_NAME (collection name), DIMENSION (embedding dimension), OPENAI_ENGINE (embedding model), openai.api_key (OpenAI key), INDEX_PARAM (index settings), QUERY_PARAM (search parameters), and BATCH_SIZE (batch size for embedding and insertion).

03
Download Dataset

Download the dataset from Hugging Face Datasets. Use HuggingLearners's netflix-shows dataset which contains over 8 thousand movies with metadata pairs including title, type, release_year, and rating.

04
Embed and Insert Data

Embed each movie description using the embedding function and insert the data into Zilliz. Iterate through all entries and create batches. Insert batches once the set batch size is reached, then insert any remaining batch after the loop completes.

05
Query the Database

Perform a query on the Zilliz database using a tuple of the movie description and filter expression. Print the search description and filter expression, then for each result display the score, title, type, release year, rating, and description.

Overview

Filtered Search with Zilliz and OpenAI

What it does

A Jupyter notebook tutorial for generating OpenAI embeddings of movie descriptions and using those embeddings within Zilliz to find relevant movies with metadata filtering

How it connects

Use this when you want to perform semantic search on movie descriptions while applying metadata filters to narrow results

Source README

Filtered Search with Zilliz and OpenAI

Finding your next movie

In this notebook we will be going over generating embeddings of movie descriptions with OpenAI and using those embeddings within Zilliz to find relevant movies. To narrow our search results and try something new, we are going to be using filtering to do metadata searches. The dataset in this example is sourced from HuggingFace datasets, and contains a little over 8 thousand movie entries.

Lets begin by first downloading the required libraries for this notebook:

  • openai is used for communicating with the OpenAI embedding service
  • pymilvus is used for communicating with the Zilliz server
  • datasets is used for downloading the dataset
  • tqdm is used for the progress bars

To get Zilliz up and running take a look here. With your account and database set up, proceed to set the following values:

  • URI: The URI your database is running on
  • USER: Your database username
  • PASSWORD: Your database password
  • COLLECTION_NAME: What to name the collection within Zilliz
  • DIMENSION: The dimension of the embeddings
  • OPENAI_ENGINE: Which embedding model to use
  • openai.api_key: Your OpenAI account key
  • INDEX_PARAM: The index settings to use for the collection
  • QUERY_PARAM: The search parameters to use
  • BATCH_SIZE: How many texts to embed and insert at once

Dataset

With Zilliz up and running we can begin grabbing our data. Hugging Face Datasets is a hub that holds many different user datasets, and for this example we are using HuggingLearners's netflix-shows dataset. This dataset contains movies and their metadata pairs for over 8 thousand movies. We are going to embed each description and store it within Zilliz along with its title, type, release_year and rating.

Insert the Data

Now that we have our data on our machine we can begin embedding it and inserting it into Zilliz. The embedding function takes in text and returns the embeddings in a list format.

This next step does the actual inserting. We iterate through all the entries and create batches that we insert once we hit our set batch size. After the loop is over we insert the last remaning batch if it exists.

Query the Database

With our data safely inserted into Zilliz, we can now perform a query. The query takes in a tuple of the movie description you are searching for and the filter to use. More info about the filter can be found here. The search first prints out your description and filter expression. After that for each result we print the score, title, type, release year, rating and description of the result movies.

Step 1: Install Required Libraries

Download and install the required libraries for this notebook: openai (for communicating with the OpenAI embedding service), pymilvus (for communicating with the Zilliz server), datasets (for downloading the dataset), and tqdm (for progress bars).

Step 2: Set Up Zilliz Database

Set up your Zilliz account and database. Configure the following values: URI (database URI), USER (database username), PASSWORD (database password), COLLECTION_NAME (collection name), DIMENSION (embedding dimension), OPENAI_ENGINE (embedding model), openai.api_key (OpenAI key), INDEX_PARAM (index settings), QUERY_PARAM (search parameters), and BATCH_SIZE (batch size for embedding and insertion).

Step 3: Download Dataset

Download the dataset from Hugging Face Datasets. Use HuggingLearners's netflix-shows dataset which contains over 8 thousand movies with metadata pairs including title, type, release_year, and rating.

Step 4: Embed and Insert Data

Embed each movie description using the embedding function and insert the data into Zilliz. Iterate through all entries and create batches. Insert batches once the set batch size is reached, then insert any remaining batch after the loop completes.

Step 5: Query the Database

Perform a query on the Zilliz database using a tuple of the movie description and filter expression. Print the search description and filter expression, then for each result display the score, title, type, release year, rating, and description.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.