Query & move data
Cluster and Name Data Groups
Automate data clustering and naming using K-means and GPT-4. Uncover hidden patterns and gain insights into your datasets with automatically generated cluster
Without it
Piece it together by hand, every time.
With it
Discover hidden groupings within your data and automatically generate descriptive names for each cluster. This helps in understanding and categorizing complex datasets.
What you get
- Perform K-means clustering on datasets.
- Extract representative samples from identified clusters.
- Use GPT-4 to generate descriptive names for each cluster.
- Visualize cluster groupings in a 2D projection.
Use this prompt chain
K-means Clustering in Python using OpenAI
We use a simple k-means algorithm to demonstrate how clustering can be done. Clustering can help discover valuable, hidden groupings within the data. The dataset is created in the Get_embeddings_from_dataset Notebook.
1. Find the clusters using K-means
We show the simplest use of K-means. You can pick the number of clusters that fits your use case best.
Visualization of clusters in a 2d projection. In this run, the green cluster (#1) seems quite different from the others. Let's see a few samples from each cluster.
2. Text samples in the clusters & naming the clusters
Let's show random samples from each cluster. We'll use gpt-4 to name the clusters, based on a random sample of 5 reviews from that cluster.
It's important to note that clusters will not necessarily match what you intend to use them for. A larger amount of clusters will focus on more specific patterns, whereas a small number of clusters will usually focus on largest discrepencies in the data.