Query Movies with Embeddings and Metadata Filters
Search movies using OpenAI embeddings and Zilliz vector database with metadata filtering.
Why it matters
Leverage OpenAI embeddings and Zilliz vector database to find relevant movies based on descriptions and filter by metadata like release year or genre.
Outcomes
What it gets done
Generate embeddings for movie descriptions using OpenAI.
Store movie data and embeddings in Zilliz.
Perform filtered searches using natural language descriptions and metadata.
Retrieve and display relevant movie search results.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/oai-filteredsearchwithzillizandopenai | bash Steps
Steps in the chain
Download and install the required libraries for this notebook: openai (for communicating with the OpenAI embedding service), pymilvus (for communicating with the Zilliz server), datasets (for downloading the dataset), and tqdm (for progress bars).
Set up your Zilliz account and database. Configure the following values: URI (database URI), USER (database username), PASSWORD (database password), COLLECTION_NAME (collection name), DIMENSION (embedding dimension), OPENAI_ENGINE (embedding model), openai.api_key (OpenAI key), INDEX_PARAM (index settings), QUERY_PARAM (search parameters), and BATCH_SIZE (batch size for embedding and insertion).
Download the dataset from Hugging Face Datasets. Use HuggingLearners's netflix-shows dataset which contains over 8 thousand movies with metadata pairs including title, type, release_year, and rating.
Embed each movie description using the embedding function and insert the data into Zilliz. Iterate through all entries and create batches. Insert batches once the set batch size is reached, then insert any remaining batch after the loop completes.
Perform a query on the Zilliz database using a tuple of the movie description and filter expression. Print the search description and filter expression, then for each result display the score, title, type, release year, rating, and description.
Overview
Filtered Search with Zilliz and OpenAI
What it does
A Jupyter notebook tutorial for generating OpenAI embeddings of movie descriptions and using those embeddings within Zilliz to find relevant movies with metadata filtering
How it connects
Use this when you want to perform semantic search on movie descriptions while applying metadata filters to narrow results
Source README
Filtered Search with Zilliz and OpenAI
Finding your next movie
In this notebook we will be going over generating embeddings of movie descriptions with OpenAI and using those embeddings within Zilliz to find relevant movies. To narrow our search results and try something new, we are going to be using filtering to do metadata searches. The dataset in this example is sourced from HuggingFace datasets, and contains a little over 8 thousand movie entries.
Lets begin by first downloading the required libraries for this notebook:
openaiis used for communicating with the OpenAI embedding servicepymilvusis used for communicating with the Zilliz serverdatasetsis used for downloading the datasettqdmis used for the progress bars
To get Zilliz up and running take a look here. With your account and database set up, proceed to set the following values:
- URI: The URI your database is running on
- USER: Your database username
- PASSWORD: Your database password
- COLLECTION_NAME: What to name the collection within Zilliz
- DIMENSION: The dimension of the embeddings
- OPENAI_ENGINE: Which embedding model to use
- openai.api_key: Your OpenAI account key
- INDEX_PARAM: The index settings to use for the collection
- QUERY_PARAM: The search parameters to use
- BATCH_SIZE: How many texts to embed and insert at once
Dataset
With Zilliz up and running we can begin grabbing our data. Hugging Face Datasets is a hub that holds many different user datasets, and for this example we are using HuggingLearners's netflix-shows dataset. This dataset contains movies and their metadata pairs for over 8 thousand movies. We are going to embed each description and store it within Zilliz along with its title, type, release_year and rating.
Insert the Data
Now that we have our data on our machine we can begin embedding it and inserting it into Zilliz. The embedding function takes in text and returns the embeddings in a list format.
This next step does the actual inserting. We iterate through all the entries and create batches that we insert once we hit our set batch size. After the loop is over we insert the last remaning batch if it exists.
Query the Database
With our data safely inserted into Zilliz, we can now perform a query. The query takes in a tuple of the movie description you are searching for and the filter to use. More info about the filter can be found here. The search first prints out your description and filter expression. After that for each result we print the score, title, type, release year, rating and description of the result movies.
Step 1: Install Required Libraries
Download and install the required libraries for this notebook: openai (for communicating with the OpenAI embedding service), pymilvus (for communicating with the Zilliz server), datasets (for downloading the dataset), and tqdm (for progress bars).
Step 2: Set Up Zilliz Database
Set up your Zilliz account and database. Configure the following values: URI (database URI), USER (database username), PASSWORD (database password), COLLECTION_NAME (collection name), DIMENSION (embedding dimension), OPENAI_ENGINE (embedding model), openai.api_key (OpenAI key), INDEX_PARAM (index settings), QUERY_PARAM (search parameters), and BATCH_SIZE (batch size for embedding and insertion).
Step 3: Download Dataset
Download the dataset from Hugging Face Datasets. Use HuggingLearners's netflix-shows dataset which contains over 8 thousand movies with metadata pairs including title, type, release_year, and rating.
Step 4: Embed and Insert Data
Embed each movie description using the embedding function and insert the data into Zilliz. Iterate through all entries and create batches. Insert batches once the set batch size is reached, then insert any remaining batch after the loop completes.
Step 5: Query the Database
Perform a query on the Zilliz database using a tuple of the movie description and filter expression. Print the search description and filter expression, then for each result display the score, title, type, release year, rating, and description.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.