Prompt Chain

Generate User and Product Embeddings for Recommendations

Name: Generate User and Product Embeddings for Recommendations
Availability: OnlineOnly
Author: OpenAI Cookbook

A prompt workflow that calculates user and product embeddings by averaging reviews, then evaluates similarity scores to weakly predict review ratings.

Copy chain

Works with github

OpenAI Cookbook

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Add to Favorites

Why it matters

Leverage user and product embeddings derived from review data to predict review scores and enhance recommendation systems. This asset provides a novel signal that can improve existing recommendation models.

Outcomes

What it gets done

Calculate user and product embeddings by averaging review data.

Evaluate embedding similarity against review scores in a test set.

Visualize the correlation between embedding similarity and review scores.

Utilize embeddings as an additional feature for recommendation improvements.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-userandproductembeddings | bash

Steps

Steps in the chain

Calculate user and product embeddings

Calculate user and product embeddings by averaging all the reviews about the same product or written by the same user within the training set. Note that most users and products appear within the 50k examples only once.

Evaluate the embeddings

Evaluate the recommendations by looking at the similarity of the user and product embeddings amongst the reviews in the unseen test set. Calculate the cosine distance between the user and product embeddings to get a similarity score between 0 and 1. Normalize the scores to be evenly split between 0 and 1 by calculating the percentile of the similarity score amongst all predicted scores.

Visualize cosine similarity by review score

Group the cosine similarity scores by the review score and plot the distribution of cosine similarity scores for each review score. Observe the trend showing that higher similarity scores between user and product embeddings correlate with higher review scores.

Overview

User And Product Embeddings

What it does

This prompt chain calculates user and product embeddings from review data by averaging all reviews about the same product or written by the same user within a training set. It then evaluates these embeddings on an unseen test set by measuring cosine similarity between user and product embeddings, producing normalized similarity scores between 0 and 1. The workflow reveals a weak trend showing that higher similarity scores between user and product embeddings correlate with higher review scores.

How it connects

Use this workflow when you need an additional signal to complement collaborative filtering systems in recommendation engines. It can act as an additional feature to slightly improve performance on existing problems, working in a different way than more commonly used collaborative filtering. Do NOT use this as a standalone recommendation system-the source explicitly describes the predictive signal as "weak." Do NOT expect strong predictive power; this is designed to "slightly improve the performance on existing problems" as an auxiliary feature, not replace existing recommendation infrastructur

Source README

User and product embeddings

We calculate user and product embeddings based on the training set, and evaluate the results on the unseen test set. We will evaluate the results by plotting the user and product similarity versus the review score. The dataset is created in the Get_embeddings_from_dataset Notebook.

1. Calculate user and product embeddings

We calculate these embeddings simply by averaging all the reviews about the same product or written by the same user within the training set.

We can see that most of the users and products appear within the 50k examples only once.

2. Evaluate the embeddings

To evaluate the recommendations, we look at the similarity of the user and product embeddings amongst the reviews in the unseen test set. We calculate the cosine distance between the user and product embeddings, which gives us a similarity score between 0 and 1. We then normalize the scores to be evenly split between 0 and 1, by calculating the percentile of the similarity score amongst all predicted scores.

2.1 Visualize cosine similarity by review score

We group the cosine similarity scores by the review score, and plot the distribution of cosine similarity scores for each review score.

We can observe a weak trend, showing that the higher the similarity score between the user and the product embedding, the higher the review score. Therefore, the user and product embeddings can weakly predict the review score - even before the user receives the product!

Because this signal works in a different way than the more commonly used collaborative filtering, it can act as an additional feature to slightly improve the performance on existing problems.

Step 1: Calculate user and product embeddings

Calculate user and product embeddings by averaging all the reviews about the same product or written by the same user within the training set. Note that most users and products appear within the 50k examples only once.

Step 2: Evaluate the embeddings

Evaluate the recommendations by looking at the similarity of the user and product embeddings amongst the reviews in the unseen test set. Calculate the cosine distance between the user and product embeddings to get a similarity score between 0 and 1. Normalize the scores to be evenly split between 0 and 1 by calculating the percentile of the similarity score amongst all predicted scores.

Step 3: Visualize cosine similarity by review score

Group the cosine similarity scores by the review score and plot the distribution of cosine similarity scores for each review score. Observe the trend showing that higher similarity scores between user and product embeddings correlate with higher review scores.

Discussion