Generate User and Product Embeddings for Recommendations
A prompt workflow that calculates user and product embeddings by averaging reviews, then evaluates similarity scores to weakly predict review ratings.
Why it matters
Leverage user and product embeddings derived from review data to predict review scores and enhance recommendation systems. This asset provides a novel signal that can improve existing recommendation models.
Outcomes
What it gets done
Calculate user and product embeddings by averaging review data.
Evaluate embedding similarity against review scores in a test set.
Visualize the correlation between embedding similarity and review scores.
Utilize embeddings as an additional feature for recommendation improvements.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/oai-userandproductembeddings | bash Steps
Steps in the chain
Calculate user and product embeddings by averaging all the reviews about the same product or written by the same user within the training set. Note that most users and products appear within the 50k examples only once.
Evaluate the recommendations by looking at the similarity of the user and product embeddings amongst the reviews in the unseen test set. Calculate the cosine distance between the user and product embeddings to get a similarity score between 0 and 1. Normalize the scores to be evenly split between 0 and 1 by calculating the percentile of the similarity score amongst all predicted scores.
Group the cosine similarity scores by the review score and plot the distribution of cosine similarity scores for each review score. Observe the trend showing that higher similarity scores between user and product embeddings correlate with higher review scores.
Overview
User And Product Embeddings
What it does
This prompt chain calculates user and product embeddings from review data by averaging all reviews about the same product or written by the same user within a training set. It then evaluates these embeddings on an unseen test set by measuring cosine similarity between user and product embeddings, producing normalized similarity scores between 0 and 1. The workflow reveals a weak trend showing that higher similarity scores between user and product embeddings correlate with higher review scores.
How it connects
Use this workflow when you need an additional signal to complement collaborative filtering systems in recommendation engines. It can act as an additional feature to slightly improve performance on existing problems, working in a different way than more commonly used collaborative filtering. Do NOT use this as a standalone recommendation system-the source explicitly describes the predictive signal as "weak." Do NOT expect strong predictive power; this is designed to "slightly improve the performance on existing problems" as an auxiliary feature, not replace existing recommendation infrastructur
Source README
User and product embeddings
We calculate user and product embeddings based on the training set, and evaluate the results on the unseen test set. We will evaluate the results by plotting the user and product similarity versus the review score. The dataset is created in the Get_embeddings_from_dataset Notebook.
1. Calculate user and product embeddings
We calculate these embeddings simply by averaging all the reviews about the same product or written by the same user within the training set.
We can see that most of the users and products appear within the 50k examples only once.
2. Evaluate the embeddings
To evaluate the recommendations, we look at the similarity of the user and product embeddings amongst the reviews in the unseen test set. We calculate the cosine distance between the user and product embeddings, which gives us a similarity score between 0 and 1. We then normalize the scores to be evenly split between 0 and 1, by calculating the percentile of the similarity score amongst all predicted scores.
2.1 Visualize cosine similarity by review score
We group the cosine similarity scores by the review score, and plot the distribution of cosine similarity scores for each review score.
We can observe a weak trend, showing that the higher the similarity score between the user and the product embedding, the higher the review score. Therefore, the user and product embeddings can weakly predict the review score - even before the user receives the product!
Because this signal works in a different way than the more commonly used collaborative filtering, it can act as an additional feature to slightly improve the performance on existing problems.
Step 1: Calculate user and product embeddings
Calculate user and product embeddings by averaging all the reviews about the same product or written by the same user within the training set. Note that most users and products appear within the 50k examples only once.
Step 2: Evaluate the embeddings
Evaluate the recommendations by looking at the similarity of the user and product embeddings amongst the reviews in the unseen test set. Calculate the cosine distance between the user and product embeddings to get a similarity score between 0 and 1. Normalize the scores to be evenly split between 0 and 1 by calculating the percentile of the similarity score amongst all predicted scores.
Step 3: Visualize cosine similarity by review score
Group the cosine similarity scores by the review score and plot the distribution of cosine similarity scores for each review score. Observe the trend showing that higher similarity scores between user and product embeddings correlate with higher review scores.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.